Journal of Applied Psychology 


Joun G. Dartey, Editor 
University or MINNESOTA 


Lorraine Boutuitet, Managing Editor 





Table of Contents 


Changes in Attitudes Toward a Low-Rent Housing Project: K. E. Clark and R. L. Jones.... 201 


Validity and Factor Analyses of Naval Air Training Predictor and Criterion Measures: J. T. 
Bair, R. F. Lockman, and C. T. Martoccia 


Dimensional Analysis of Motion: X. Experimental Evaluation of a Time-Study Problem: 
D. Hecker, D. Green, and K. U. Smith 


The Speed and Accuracy of Reading Horizontal, Vertical, and Circular Scales: N. E. Graham 228 


Evaluation of a Display Incorporating Quantitative and Check-Reading Characteristics: 


Comprehension by Reading versus Hearing: W. B. Webb and E. J. Wallon 
Role Perceptions of Successful and Unsuccessful Supervisors: E. E. Ghiselli and R. Barthol 241 
Job Expectancy and Survival: J. Weitz 


The Use of a Sentence Completion Test in Measuring Attitudes Toward Superiors and Sub- 
ordinates: L. S. Burwen, D. T. Campbell, and J. Kidd 


A Validation Study of the Prediction of College Achievement: J. W. Frick and H. E. Keener 251 


Predicting Grade Point Average with a Forced-Choice Study Activity Questionnaire: G. 
Schutter and H. Maher 


Fakability of a Forced-Choice Personality Test Under Realistic High School Employment 
Conditions: L. V. Gordon and E. S. Stapleton 


A Technique for Increasing the Reproducibility of Cumulative Attitude Scales: A. L. Edwards 263 


The Relationship Between Item Ambiguity and Discriminating Power in a Forced-Choice 
Scale: E. S. Isard 


Using ‘Mark Sense” for Ratings and Personal Data Collection: B. M. Bass and C. R. Wurster 269 


The Application of Temporal Correlation Techniques in Psychology: W. J. Merrill, Jr., and 
C. A. Bennett 





American Psychological Association 


Volume 40, Number 4 August, 1956 





Consulting Editors 


Harold E. Burtt, Ohio State University 

Alphonse Chapanis, Johns Hopkins Univer- 
sity 

Clifford E. Jurgensen, Minneapolis Gas 
Company 

Laurence S. McGaughran, University of 
Houston 

Quinn McNemar, Stanford University 


Alexander Mintz, City College of New York 
Harold F. Rothe, Fairbanks, Morse and 
Company 
Julian B. Rotter, Ohio State University 
Thomas A. Ryan, Cornell University 
Donald E. Super, Columbia University 
Miles A. Tinker, University of Minnesota 
Alfred C. Welch, University of New Mexico 





This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical and consulting psychology, al- 
though a descriptive or theoretical article may be 
accepted if it represents a special contribution in 
an applied field. Quantitative investigations of in- 
terest or value to psychologists working in the fol- 
lowing broad fields will be considered: vocational 
and educational prognosis, diagnosis, and guidance 
at the secondary and college level; personnel re- 
search in business, industry, and government; bio- 
mechanics; industrial working conditions; research 
on opinion and morale factors; job analysis and 
classification research; market and advertising re- 
search. 


Because of the large number of manuscripts sub- 
mitted, authors should adhere to the rule of 


“brevity consistent with clarity.” The typical 
manuscript should run to approximately 4,000 
words. There is a lag of approximately twelve 
months between receipt and publication of an 
article. Authors may request advanced publica- 
tion if they are prepared to pay the cost of print- 
ing the necessary extra pages. 


Manuscripts should be addressed to the Editor, 
John G. Darley, 408 Johnston Hall, University of 
Minnesota, Minneapolis 14, Minnesota. All manu- 
scripts should be submitted in duplicate. Original 
figures are prepared for publication; duplicate fig- 
ures may be photographic or pencil-drawn copies. 

Manuscripts must conform to the style require- 
ments described in the “Publication Manual of the 
American Psychological Association,” Psychol. Bull., 
1952, 49, No. 4, Part 2. 





Journal of Applied Psychology 


Published bimonthly by the 
American Psychological Association 
Prince and Lemon Sts., Lancaster, Pa. 
and 1333 Sixteenth Street N.W. 
Washington 6, D. C. 


$8.00 per volume 


$1.50 per issue 


Subscriptions, orders, and business communications should be addressed to the American Psychological Association, 
1333 Sixteenth St. N.W., Washington 6, D. C. Address changes must reach the subscription office by the 10th of 
the month to take effect the following month. Undelivered copies resulting from address changes will not be replaced; 


subscribers should notify the post office that they will guarantee second-class forwarding postage. 


Other claims for 


undelivered copies must be made within four months of publication. 
Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879. 
Acceptance for mailing at the special rate of postage provided for in paragraph (d-2), Section 34.40, P. L. & R. 


of 1948, authorized October 10, 1947. 


© 1956 by the American Psychological Association, Inc. 





Journal of Applied Psychology 








VoL. 40, No. 4 


AUGUST, 1956 








Changes in Attitudes Toward a Low-Rent Housing Project ' 


Kenneth E. Clark and Robert L. Jones 


University of Minnesota 


The psychological and sociological litera- 
ture contains a number of studies which bear 
on reactions to public housing and commu- 
nity development (3, 4, 5, 6). Many such 
investigations deal with reactions of public 
housing tenants to their housing project and 
with their attitudes toward fellow tenants. 
Many have been single-shot cross-section 
studies Few have provided for follow-up 
research on neighborhood reaction to a pub- 
lic housing project and to project residents 
over a period of time. This study is the third 
in a series reporting neighborhood reaction 
to and attitudes toward a public, low-rent 
housing project built in 1952 in a long-estab- 
lished residential area in a large midwestern 
city. It is designed to provide panel-type 
data on individual respondent opinions and 
household opinions in addition to providing 
more gross data on total sample responses 
from year to year. It examines patterns of 
attitude shift and reactions to several stages 
of “reality” in a community-change situation. 

The first article in this series (1) reported 
reactions of a fixed-address probability sam- 
ple of community residents in 1950 to the 
prospect of a housing development being built 
in the neighborhood after plans for its erec- 
tion had been approved by the city council, 
but before any construction had begun. In 
addition to opinion and attitude data, this 
initial study obtained responses to a set of 
information questions about the proposed 
project, estimates of the effect of the project 
on property values, taxes, school crowding, 


1The writers are indebted to the Graduate School 
of the University of Minnesota for financial support 
of the project described herein. Field work was per- 
formed by the professional staff of the Research Di- 
vision, School of Journalism, University of Minne- 
sota. 


noise and traffic, and data on the extent to 
which respondents had participated in com- 
munity meetings about the project. 

The second study (2) was undertaken in 
1952 just after construction of the project 
had been completed but before any families 
had occupied the units. The 1950 question- 
naire was used again in the 1952 study with 
minor modifications. Interviewers called on 
all fixed-address households drawn in the ini- 
tial study sample and on an additional 192 
households which were assigned in order to 
increase the stability of breakdown analyses. 

A summary of the results of the earlier 
studies follows: 

1950 “Planning-Stage” study: As many 
neighborhood residents opposed the project 
as favored it, although a fourth of the sam- 
ple took no position on the matter; a “near” 
group of residents located within two blocks 
of the proposed site of the development was 
only slightly less favorable to the project 
than the remainder of the sample members; 
income level was not significantly related to 
favoring or opposing the project; people who 
had attended community meetings on the 
project were much more intense in their opin- 
ions and tended to favor the project; more 
than a third of the respondents thought prop- 
erty values would go down because of the 
project, while only 5 per cent thought they 
would go up because of it; more than 40 per 
cent (a plurality of the respondents on this 
item) thought the project would bring “un- 
desirable” people into the neighborhood and 
into the schools; more than a fourth indi- 
cated that the project would affect their plans 
to stay in the neighborhood; those who held 
more intense opinions on the project were 
better informed about it, and persons who at- 





202 


tended community meetings on the topic were 
better informed than those who relied on the 
daily press or on conversation with others for 
their information. 

1952 “Completion-of-Construction” study: 
The proportion of neighborhood residents fa- 
voring the project increased from 39 to 45 per 
cent and the proportion opposing dropped 
from 38 to 31 per cent. The undecided pro- 
portion remained essentially the same as be- 
fore; this upturn in favorable attitude cut 
across all income levels; there was a substan- 
tial drop in proportion of respondents think- 
ing that the project would adversely affect 
property values or would bring “undesirable 
persons” into the neighborhood; many fewer 
persons indicated that the project would have 
any effect on their long term plans to con- 
tinue residence in the neighborhood; there 
was some loss in information level about the 
project; on the “core question’ which asked 
whether the respondent favored or opposed 
the project there was a sizable shift from 
no opinion to favor, and from oppose to un- 
decided among identical respondents inter- 
viewed in both studies. 


Procedure 


In the early summer of 1954—two years after the 
housing project began to be occupied by tenants— 
the present study was completed. The 367 fixed ad- 
dresses which were used in the 1952 study were re- 
assigned to a team of professional interviewers. The 
questionnaire from the 1952 study was slightly modi- 
fied, mostly by dropping inapplicable construction- 
era questions and by changing verb tenses. The 
same key questions were asked again. Interviewers 
were assigned to interview a responsible adult at 
each fixed-address household and to maintain about 
a 50-50 split between men and women. Of the 367 
assigned addresses, five were unusable in 1954 be- 
cause the houses either had been torn down or were 
vacant at the time of the survey. Out of the 362 
remaining households, interviews were obtained with 
a responsible adult at 347, or 96 per cent. Seven 
householders refused to be interviewed (less than 2 
per cent) and at the remaining eight households no 
one was found at home even with three call-backs 
at different hours of the day. Most of these families 
appeared to be on vacation according to informa- 
tion from neighbors. The 96 per cent completion- 
of-interviews figure maintained the record of this 
interviewing crew of having completed 96 per cent 
of its assignments in each of the three surveys.” 


2It might be mentioned that refusal to be inter- 
viewed in this study appeared to be more a situa- 


Kenneth E. Clark and Robert L. Jones 


The first portion of the results reported here de- 
scribe total or gross change in reaction to the hous- 
ing project using data from all persons sampled in 
1950, 1952, and 1954. 

Because of such factors as families moving into 
and out of the neighborhood, vacation-taking, and 
refusals and because of the “any-responsible-adult” 
sampling plan within the fixed addresses, the total 
samples in the 1952 and 1954 follow-up studies were 
made up of four components: identical respondents 
from the preceding study, identical households (fami- 
lies) from the preceding study but not the same 
respondent, new respondents and new families in 
certain of the same fixed addresses, and “noncorre- 
sponding cases,” ie., addresses at which an inter- 
view was obtained in one of the years, but not in 
the others. 

A breakdown of the 1954 sample compared with 
the 1952 sample on these four kinds of respondents 
will illustrate the composition of the latest sample: 
Of the 347 interviews completed in 1954, 119 were 
with identical respondents from the 1952 study, 139 
were with an alternate adult in the same house- 
holds, 75 were with respondents from entirely new 
families which had moved into the old fixed-ad- 
dress houses, and 14 were noncorresponding cases. 
Using 1950 respondents as a base (keeping in mind 
the addition of the new fixed addresses in 1952) the 
1954 sample has 49 identical respondents (these cases 
will be analyzed in panel fashion in a later sec- 
tion), 48 same-households, 72 same address-different 
family cases, and 178 noncorresponding cases. 


Results 
Total Samples 


Opposition to the housing project continued 
to decrease during the 1952—54 period. Thus 
at a time when the “reality” of the commu- 
nity change was highest—when the project 
was no longer just a plan or a group of un- 
tenanted buildings—more people favored and 
fewer opposed the project than in its earlier 


stages. Responses to the “core question,” Do 
you now favor or oppose the presence of this 
housing! development in the neighborhood?, 
and an intensity question, How strongly do 
you feel about this?, are presented in Table 1 


tionally-influenced phenomenon than a “trait” of 
uncooperativeness on the part of the refusing re- 
spondent. Initial refusals were encountered at 17 
households. Skilled follow-up interviewers succeeded 
in getting 10 of these cases to complete the ballot. 
Only one of the seven remaining cases was a house- 
hold at which an interview had been refused in 
1952. Three of the 1954 “hard refusals” were at 
households in the midst of some personal crisis at 
the time, usually illness. In four cases, households 
which had refused an interview in 1952 (and which 
were the same families in 1954) granted an inter- 
view in 1954. 





Attitude Changes Toward a Low-Rent Housing Project 


Table 1 


Opinions Toward Low Rent Housing Project in 1950, 1952, and 1954 





Favor 


1950 
Total group: Number 73 
Per cent 39 

By intensity of feeling (per cent) : 
Very strongly 42 
Rather strongly 33 


Not strongly at all 
No answer 


Total per cent 


alongside data from comparable questions 
from the 1950 and 1952 studies in which the 
question wording varied slightly to take ac- 
count of the planning stage and completion- 
of-construction stage of the project. 

It is interesting to note that the proportion 
of undecided respondents or persons with no 
opinion remained in the vicinity of a fourth 
of the total sample and even showed a slight 
increase in 1954. Further analysis of this 
undecided group in terms of length of resi- 
dence in the community revealed that there 
was no significant difference in proportion 
of undecided respondents among short-, me- 
dium-, and long-term residents of the neigh- 
borhood. The sizable incidence of undecided 
response, then, does not appear to reflect any 
lack of opportunity on the part of a group 


No Opinion or 


Oppose Qualified 


1950 1952 SS Se 
71 =108 44 84 
38 31 23 24 


53 
33 
13 

1 


100 


of short-term residents to become aware of 
the development. 

Response to the “core question” next was 
analyzed by various respondent income levels. 
In phrasing the income question in 1954, an 
additional $500 was added to each response 
category compared with the 1952 question. 
This was the writers’ estimate of the average 
income increment expected during the two- 
year period. A similar adjustment had been 
made in the 1952 question as compared with 
the 1950 question. The similarity between 
the 1952 and 1954 frequencies in income 
classifications following the $500 adjustment 
is very close and indicates that the estimate 
of typical income increment was rather ac- 
curate. A distribution of respondent incomes 
for all three studies is shown in Table 2. 


Table 2 


Reported Incomes of Respondents in 1950 


, 1952, and 1954 





Income Level 





1950 1952 


$5,000 up $5,500 up 
4,000—-4,999 4,500-5,499 
3,000-3,999 3,500—4,499 
2,000—2,999 2,500-3,499 
1,500—2,499 
0-1,499 


1954 
$6,000 up 
5,000—5,999 
4,000-4,999 
3,000-—3 ,999 
2,000—2,999 
0-1,999 


No answer 


Total 


Per Cent 








1952 


1950 1954 
33 31 32 
20 16 16 
19 20 17 
16 15 14 

7 

7 


7 


100 








Kenneth E. Clark and Robert L. Jones 


Table 3 








Income Level 





1950 1954 
$5,000andup $5,500andup $6,000andup 62 
$3,000 to 4,999 $3,500:to 5,499 $4,000t05,999 74 
Less than $3,000 Less than $3,500 Lessthan $4,000 45 


50 


1952 


Table 3 presents a breakdown of response 
to the “core question” by persons of high, 
-medium, and low income. Noteworthy in the 
1954 data was an evening-out of “favor” re- 
sponses over all income classes in contrast to 
the pattern in the previous studies. This 
evening-out represents a substantial increase 
in the “favor” response on the part of the 
lowest income group over the four years. 
“Oppose” responses have declined in all in- 
come classifications, but least sharply in the 
highest income group. “Oppose’’ responses 
among persons in the lowest income bracket 
have decreased more than half between 1950 
and 1954. A new trend appears in the 1952-— 
1954 data of Table 3 for undecided respond- 
ents. The table shows a large increase from 
1952 to 1954 in the number of persons in the 
middle-income group who are undecided or 
have no opinion about the project. This re- 
verses the trend for this response from 1950 
to 1952. 

Tables 4, 5, and 6 present responses to 
three questions requiring respondents to “‘pre- 
dict” effects of the project. The data are pre- 
sented for the total group in each of the 
three studies and also according to “core 


Per Cent 
Qualified or 
No Opinion 


Per Cent 
Oppose 


Per Cent 
N Favor 
50 752 
44 47 
39 52 
89 101 35 40 


SO *S2 *S4 
21 19 24 
26 20 32 
2 3 ® 


50 ’°52 ’54 
35 34 
35 28 19 
43 27 21 


52°54 
108 109 
125 114 


49 26 


49 
question” breakdowns isolating respondents 
who favored the project, opposed it, or were 
undecided, or had no opinion about it. 

Table 4 data on the likely effect of the 
project on property values indicates that fear 
of adverse effects had declined very sharply 
and in 1954 was no longer considered of any 
substantial importance except by the minor- 
ity who opposed the project. Even within 
this latter group, less than half felt that the 
project would reduce property values. Most 
of the shift in response to this question was 
from a “property values will go down” re- 
sponse to a qualified or don’t-know response. 
Clearly, very few respondents believed the 
project would increase neighborhood property 
values. 

Table 5 indicates that the proportion of 
total respondents who in 1954 thought the 
project Aad brought undesirable people into 
the neighborhood (a “yes” response) is very 
nearly the same as the proportion of total 
respondents who in 1952 expected that the 
project would do so. Both these figures, 
however, are well below the 1950 expecta- 
tions on this matter. It appears, then, that 
planning-stage fears concerning effect of the 


Table 4 





Go Up 
N (Per Cent) 


1950 1952 1954 
351 
159 
108 


Total group 
Favor project 
Oppose project 
No opinion or qualified 
opinion on project 2 


347 : 4 
169 5 
77 d 2 


"SQ. “52 


Do You Think Property Values Will Go Up, Down, or Stay the Same? 


Other 
(Per Cent) 


Go Down 
(Per Cent) 


Stay Same 
(Per Cent) 
54 we "Ss Sh 
55 10 8 28 
71 s 7B 
32 + && 


54 52 ’54 50 
4 35 28 13 60 
4 a 77 


4 - 30 


50 


"S2 


3 45 19 11% 48 








Attitude Changes Toward a Low-Rent Housing Project 


Table 5 
Do You Think This Unit Will Bring Undesirable People Into Neighborhood? 


N 
1950 1952 
351 
159 
108 


Total group 188 
73 


71 


Favor project 

Oppose project 

No opinion or qualified 
opinion on project 


44884 


project on the kinds of people who would be 
brought into the neighborhood were reduced 
over the 1950-54 span, but that no part of 
this change to a more favorable view took 
place after the residents moved in. 

Table 5 data, broken down by how people 
responded to the “core question,” show some 
interesting trends during the three stages of 
the housing project. It is seen that more 
than three fourths of the persons opposed to 
the project in 1950 believed that the project 
would bring undesirable people into the neigh- 
borhood. In 1952 the proportion was sub- 


stantially reduced, possibly as a consequence 


of the attractive physical appearance of the 
completed project.* Then in 1954, after two 
years of occupancy of the project, the view 
of the opposed group returned to just about 
its initial pessimistic level. The pattern over 
the years for the group which favored the 
project is somewhat ambiguous. Among per- 
sons who were undecided or who had no opin- 
ion on the “core question,” however, there has 
been a steady trend toward belief that the 
project would not bring and had not brought 
undesirable people into the neighborhood. 
One result of the 1950 study which indi- 
cated the extent of opposition to the plans 


8 Evidence concerning reaction to the appearance 
of the project was obtained in 1954 from a auestion 
which asked, “Do you think in the long run this 
housing development will make the neighborhood 
look more attractive, look about the same, or look 
less attractive?” The more attractive response had 
a plurality and was selected twice as often as the 
less attractive alternative. Another question seek- 
ing reaction to the landscaping (lawns, trees, and 
shrubs) of the project was answered in the very 
good response category by 49 per cent of the sam- 
ple. Only 3 per cent said the landscaping looked 
“rather bad.” 


(Per Cent) 


Yes No 
(Per Cent) 
52 
45 
67 
18 


(Per Cent) 


54 
30 
20 
71 


52 54 
28 
1 


58 


52 


54 
41 


50 
21 
12 
14 


23 


16 39 45 38 


for the project was the rather sizable num- 
ber of persons who said that such a project 
would affect their long-term plans to remain 
as residents in the neighborhood. At that time 
over half of those opposed to the project and 
over a quarter of all respondents indicated 
that the project probably would influence 
them to leave the neighborhood. Table 6 
shows data for all three periods of time on 
this question for the total sample and for the 
favor, the oppose, and the no-opinion—un- 
decided breakdowns on the “core question.” 

For the total group, for those who favored 
the project, and for those who had no opin- 
ion or who were undecided about it, the trend 
on the leave-the-neighborhood question is 
quite steady over the years. The project has 
decreased in importance as an influence on 
plans to stay in the neighborhood until only 
one person in 10 in the total sample in 1954 
regarded the project as a deterrent to his 
continued residence. In 1954 a majority 
even of those who opposed the project did 
not see it as influencing their continued resi- 
dence, although about a third of this group 
did say they have intentions to leave the 
neighborhood because of it. The extremely 
low “yes” percentages for the favor and for 
the undecided-no opinion groups indicate that 
the project is of almost no importance to these 
persons as a matter affecting continued neigh- 
borhood residence. 

Another insight into this area of concern 
was provided in 1954 by a supplementary 
question asking each respondent whether he 
knew of anyone who had moved out of the 
neighborhood because of the housing project. 
Fifteen per cent of the respondents said they 





Kenneth E. Clark and Robert L. Jones 


Table 6 


Will Construction of Development Have Effect on Your Long-Term Plans to 





Stay or Move Out 


of Neighborhood? 





N (Per Cent) 


Other 
(Per Cent) 


No 
(Per Cent) 


Yes 





1950 


Total group 
Favor project 73 
Oppose project 71 
No opinion or qualified 

opinion on project 


1952 1954 
351 347 
159 169 11 
108 77 55 


50 


44 84 101 11 





“Si a ee 


"S2 «(°S 50°52 = ’54 
64 77 85 
86 96 97 
28 #47 ~=«#Si 


1 1 
3339 


6 4 84 79 93 





did know of such a case or cases. Breakdown 
analysis reveals that this 15 per cent is com- 
posed of twice as great a proportion of per- 
sons who opposed the project as it is of per- 
sons who favored or were undecided about it. 

Further evidence was obtained on the in- 
fluence of the project on continued neighbor- 
hood residence. Responses on the 1952 bal- 
lot for those families who actually left the 
neighborhood between 1952 and 1954 were 
analyzed. This analysis sheds light on the 
pre-moving attitudes of “actual” movers in- 
stead of basing attitude analysis on state- 
ments of intent to move. Data on the intent- 
to-leave question and the “core question” in 
1952 for those individuals who did leave the 
neighborhood between 1952 and 1954 indi- 
cate that there is no significant difference be- 
tween those who moved and those who stayed 
in 1952 responses to the intent-to-leave ques- 
tion. Further, those who moved and those 
who stayed are not significantly differentiated 
by their “core question” responses. In other 
words, persons who said that the project 
would affect their long-term plans to stay in 


the neighborhood have remained in the neigh- 
borhood in equivalent proportion to those who 
did not see the project as an influence on con- 
tinued residence. It seems, then, that expres- 
sions of project influence on plans to stay in 
the neighborhood, at least over a two-year 
period, are not correlated with actual moving 
behavior. 

Certain information questions concerning 
the project were included in the 1950, 1952, 
and 1954 questionnaires. In the earlier bal- 
lots, some questions concerning the physical 
appearance of the units were asked. On the 
assumption that there would be almost com- 
plete familiarity with such matters by 1954, 
these questions were dropped in 1954. This 
reduction left just three information questions 
common to all three surveys. Table 7 shows 
the trend over the years in information about 
aspects of the project covered by these items. 
Question 1 asked, “About how many families 
do you understand will be housed in this de- 
velopment?” Question 2 asked, “What is the 
most money a family can make a year and 
still rent a place in the development?,” and 


Table 7 


Correct Responses to Information Questions in 1950, 1952, and 1954 





Question 1 
N (Per Cent) 


Question 2 
(Per Cent) 


Question 3 
(Per Cent) 





1950 1952 


50 





52 50 |’S2 54 ‘~ "SZ 





Total group 188 351 
Favor project 73, «159 
Oppose project 71 ~=108 
No opinion or qualified 

opinion on project 


347 
169 
77 


32 
32 
44 84 


101 23 


30 


2439 
4752 
24 26 


23 27 34 8 
21 3 32 = 10 
29 3141 4 


19 16 29 7 27. 32 








Attitude Changes Toward a Low-Rent Housing Project 


Question 3 inquired, “If an undesirable family 
gets into this development will the housing 
authority be able to get them out?” On both 
Questions 1 and 2, which required fairly spe- 
cific information, there was a sharp reduction 
in correct responses from 1952 to 1954. Less 
than one person in 10 knew the correct an- 
swer to these questions in 1954, whereas from 
a fourth to a third knew the answers in 1952. 
On Question 3 there was an increase in the 
number of persons knowing that the housing 
authority could evict undesirable families from 
the development. These data suggest that as 
the issue of the housing development passed 
from the planning stage to the building stage 


207 


to the stage of an accomplished residential 
fact, the specific information level of neigh- 
borhood residents about the project declined 
sharply. This decline was proportionately 
great across the categories of favorableness- 
unfavorableness toward the project. 
Responses in each of the study years to a 
set of “appearance” and “project nuisance” 
items are shown in Table 8. No significant 
trend occurs over the years in response to 
a question about whether the project will 
make or has made the neighborhood more 
attractive physically. Two questions, one on 
whether shopping has been made more easy 
or more difficult and one on whether the proj- 


Table 8 


Responses to ““Appearance”’ and “Project Nuisance” Questions in Three Stages of Project Reality 








1950 


(Planning Stage) 


Question and Response N 


Will/Has the project make/made the 
neighborhood look more attractive? 
More attractive 
Less attractive 
About the same 
DK/no opinion 


Will/Has the project make/made 
shopping harder? 

Easier 

Harder 

No change 

DK/no opinion 


Will/Has project make/made neigh- 
borhood more pleasant place to live? 
More pleasant 
Less pleasant 
No change 
DK/no opinion 


Will/Has project make/made 
neighborhood more noisy? 

Yes 

No 

DK/no opinion 
Will/Has project create/created 
a traffic nuisance? 

Yes 

No 

No opinion 


1952 
(Construction Stage) 


1954 
(Occupancy Stage) 


Per Per Per 
Cent N 











208 


ect has made the neighborhood a more or less 
pleasant place to live, show sharp trends 
away from either a definite “more” or a 
“less” answer and toward an “unchanged” 
response.. These data indicate that actual 
experience with the project as a neighbor- 
hood entity leads to a judgment by a vast 
majority of respondents that things are about 
the same as before. 

Two questions about physical nuisances 
(noise and traffic hazards) resulting from 
the project show strong and significant trends 
toward a “no nuisance” response in 1954 
after two years of exposure of neighborhood 
residents to any such nuisance which the 
project might cause. Experience with the 
project, then, has sharply lessened earlier ex- 
pectations concerning these physical nuisance 
matters. 


Identical Respondents 


The preceding analysis has described total- 
sample changes in neighborhood reaction to 
the housing development but has not shown 
important data on shifts in attitudes on the 
part of individuals. Although names of re- 
spondents were not taken in the interviews, 
accurate respondent matching was possible 
through responses on sex, age, education, oc- 
cupation, and length of residence in the dwell- 
ing unit. In addition, a number of 1954 re- 
spondents commented to field personnel that 
they had been interviewed in one of the previ- 
ous studies. 

Two sets of panel-type results are avail- 
able on individuals, one based on data from 
identical respondents who were interviewed 
in the 1950 and in the 1954 study, and an- 
other based on identical respondents inter- 
viewed in the 1952 and in the 1954 studies. 
The major analyses reported here are for the 
1950-1954 group of 49 cases who represent 
identical individuals interviewed at the earli- 
est and at the latest stages of inquiry con- 
cerning the project. 

Table 9 shows responses on the “core ques- 
tion” for this four-year panel group. 

A rather striking trend emerges from this 
panel table. It is clear that the greatest 
amount of shift is from an earlier “oppose” 
or “no-opinion” response to a more favorable 


Kenneth E. Clark and Robert L. Jones 


Table 9 


Comparison of Responses of Same /ndividuals on 
“Core Question” in 1950 and 1954 


1954 Response 


Qualified 
or No 


1950 Response Oppose Opinion Favor Total 
Favor 3 13 18 
Qualified or 

no opinion ‘ 11 


Oppose 20 


Total : 49 


subsequent response. Fifteen persons shifted 
to a more favorable view while only five have 
shifted to a less favorable one. Nearly a 
third of all respondents who in 1950 or in 
1952 were opposed to the project swung all 
the way over to the “favor” response by 1954. 
Half of those who shifted from “oppose’”’ to 
some other answer shifted all the way to 
“favor.” The remainder shifted, of course, 
to “no opinion” or “qualified.” This is in 
contrast to a similar analysis of identical re- 
spondents from 1950 to 1952 reported in an 
earlier article (2) where the gains in a fa- 
vorable direction for a same-individual panel 
group were very largely from no opinion to 
favor or from oppose to no opinion. 

Data from the 1952-54 identical-respond- 
ent cases corroborate the trends of Table 9. 
For this group of 119 cases, 35 shifted to a 
more favorable answer during the two years 
while only 16 shifted to a less favorable view. 

There is a suggestion in Table 9 that those 
respondents who in 1950 gave “no opinion” 
or “undecided” responses and who shifted to 
another answer in 1954 shifted mostly to “fa- 
vor.” This trend is strongly supported by 
the 1952-54 data where two-thirds of per- 
sons giving initial no opinion or qualified an- 
swers in 1952 changed their view in 1954 to 
“favor.” 

The greater stability of an initial “favor” 
response compared with an initial “oppose” 
response is indicated by the following propor- 
tions: Seventy-two per cent of identical per- 





Attitude Changes Toward a 


sons who favored the project in 1950 favored 
it in 1954. This is supported by a 69 per 
cent figure for 1952-54 identical respondents 
who favored the project in 1952 again favor- 
ing the project in 1954. By comparison, 40 
per cent of identical persons who opposed the 
project in 1950 opposed it in 1954. For 
1952-54 identical respondents, 55 per cent 
of those who opposed the project in 1952 op- 
posed it in 1954. 

Highlights of 1950-54 individual panel data 
analysis on other questions include such find- 
‘ings as: 


1. Less than a third of those persons who 
in 1950 expected that property values would 
go down as a result of the project responded 
that values actually had gone down by 1954. 
In fact, all persons in this panel group who 
said in 1954 that property values had gone 
up between 1950 and 1954 were in the group 
who believed in 1950 that the project would 
reduce property values. 

2. More than two-thirds of the same indi- 
viduals who said in 1950 that the project 
would make the neighborhood a less pleasant 
place in which to live shifted by 1954 to a 
more favorable position—largely to a re- 
sponse of “no change” in neighborhood pleas- 
antness. 

3. Decided shifts occurred on two physical 
nuisance items. Two-thirds of the individu- 
als who thought in 1950 that the project 
would make the neighborhood more noisy 
shifted to a more favorable response in 1954. 
More than half of those who thought the 
project would create a traffic nuisance shifted 
to a more favorable response in 1954. 

4. A shift in a favorable direction was 
noted on the question, “Are people in the 
housing units about the same as others in 
the neighborhood?” A 13 per cent marginal 
gain in the “Yes” response was picked up 
about equally from initial ‘““No” and “Don’t- 
know” respondents. Very little shift from an 
initial “Yes” to a subsequent “No” response 
occurred. 

5. The item with the greatest amount of 
turnover was the one inquiring whether the 
project would make the neighborhood physi- 
cally more attractive. The marginal totals 
on this item showed a small gain in the di- 


Low-Rent Housing Project 209 
rection of a more attractive response. Inter- 
nally, however, the table showed a very great 
number of shifts. The marginal gain was 
due largely to about half of the persons who 
initially thought the neighborhood would look 
“about the same” shifting to a “more attrac- 
tive” response in 1954. Counterbalancing 
this, less than half of those who initially 
thought the project would make the neigh- 
borhood more attractive in 1950 still thought 
so in 1954. 


A further value of these panel data on 
same-individuals is the light they shed on the 
total sample results reported in the preceding 
section. If the trend in responses of the 
same individuals over the years turned out 
to be systematically different from “trends” 
in the cross-section response totals for the 
complete sample, then doubt would be raised 
concerning the accuracy of interpreting year- 
by-year differences in cross-section totals in 
terms of neighborhood “changes” in attitude. 
What might appear to be total-sample shifts 
toward more favorable views in the total 
sample over time could actually reflect, for 
example, the views of less involved and hence 
more neutral or more favorably disposed new 
residents in neighborhood fixed addresses. 

The marginal frequency proportions in 1954 
for the 1950-54 same-individuals are not sig- 
nificantly different, however, from remainder- 
of-sample responses on any of the attitude 
items on the ballot. Marginal data for same- 
individuals for 1954 from Table 9 give an 
example of this correspondence on the “core 
question.” The 1950-54 same-individual re- 
sponses on this question in 1954 are Favor— 
45 per cent, Oppose—22 per cent, and No 
Opinion/Undecided—33 per cent. The re- 
mainder-of-sample proportions are 49, 22, 
and 29 per cent respectively. These findings 
indicate that the contribution to the total 
sample attitude picture made by new resi- 
dents, noncorresponding cases, and the like 
is in accord with the pattern of attitude shifts 
discernible through the analysis of identical 
individuals interviewed over the years. This 
lends strength to the total sample results as 
indicative of trends in neighborhood reaction 
to the housing development. 





210 


Same Households, Different Respondents 


In 187 instances in 1954 an interview was 
obtained at a fixed address with a responsible 
adult who was a member of the family of the 
person interviewed at an earlier date, but not 
the previous interviewee himself. In 48 of 
these cases, the families were the same fami- 
lies interviewed in 1950, and in 139 cases the 
families were the same families interviewed 
in 1952. 

Results from these “family-member panels” 
on certain of the main questions in the ballot 
will be presented here. 

On the “core question,’ the marginals in 
Tables 10 and 11 show that 1954 responses 
from same-household respondents are quite 
different from responses of the remainder of 
the sample and from responses of same-in- 
dividuals in showing no increase in the “fa- 
vor” response in 1954. Table 10, in fact, 
shows a slight decrease in the “favor” re- 
sponse. In contrast to the data in the previ- 
ous section on same-individuals, shifts from 
previous views in the same-family sample are 
almost equally distributed in more favorable 
and less favorable directions. A combination 
of data from Tables 10 and 11 indicates that 
49 same-family members who shifted from an 
earlier view expressed by an alternate adult 
respondent in the family shifted to a less fa- 
vorable view: 48 of the shifters moved to a 
more favorable view. 

Relative stability of response on the “core 
question” for same-individual and same-house- 


Table 10 
Comparison’of Responses of Same Family Respondents 
on “Core Question” in 1950 and 1954 











1954 Response 


Qualified 
or No 


Favor Total 





Favor 4 [ 14 | 22 


Qualified or 
no opinion 


Oppose 


Total 





Kenneth E. Clark and Robert L. Jones 


Table 11 


Comparison of Responses of Same-Family Respondents 


1954 Response 


Qualified 
or No 
1952 Response Oppose Opinion 


Favor Total 
Favor 8 17 42 67 
Qualified or 

no opinion 12 : 39 
Oppose 12 33 


Total 33 41 5 139 


hold respondents was analyzed to give some 
notion of the comparative firmness of view 
for these classes of respondents. Same-house- 
hold respondents gave only 48 per cent re- 
sponses on the “core question” identical with 
the earlier reply of an alternate household 
member. This is substantially lower than 
the corresponding percentage, 57, for same- 
individuals. This and the preceding data in 
this section suggests that a moderate amount 
of division of opinion exists within house- 
holds. 

Through the remainder of the question- 
naire items, the 1950-54 same-household re- 
spondents deviated further in marginal item 
totals from the remainder of the sample than 
did the same-individuals. A characteristic of 
same-family respondents in 1952-54 was the 
giving of substantially fewer ‘“Don’t-know,” 
‘“No-opinion,” or “Qualified” answers to the 
various attitude items than nonpanel respond- 
ents. Same-family respondents tended to have 
much more definite views. The item which 
most characterized this tendency was the one 
inquiring whether property values had gone 
up, gone down, or stayed the same since the 
project was begun. Pooling all cases of same- 
family respondents shows that only about a 
sixth had no opinion on this item while more 
than a third of the rest-of-samples respond- 
ents, excluding same-individuals, had no opin- 
ion. This is reasonable in view of the greater 
opportunity of same-family respondents to be- 
come aware of property value trends over a 
time period. 





Attitude Changes Toward a 


Summary 


In 1950, shortly after city government ap- 
prova: was given te plans for a low-rent pub- 
lic housing project in an established residen- 
tial neighborhood of a large midwestern city, 
interviews were conducted with neighborhood 
residents to determine their opinions about 
the project. Information questions about the 
project also were included. A fixed-address 
sampling plan was employed. Results from 
this study were called “planning-stage” data 
and represented views of neighborhood resi- 
dents at an early stage of project reality. 
Opinions were not anchored to any physical 
and tangible neighborhood change but were 
held with respect to less tangible plans and 
prospects. 

In 1952, when construction of the project 
had been completed but before any tenants 
had moved in, a second study was completed, 
using the same set of questions and the same 
fixed addresses plus another set of fixed ad- 
dresses drawn to expand the sample. Results 


from this study were called “construction- 
stage” data and represented opinions an- 
chored to the physical reality of the finished 


housing, but not to the human and social re- 
ality of the presence of occupants. 

In 1954, after tenants had been occupying 
the project for two years, a third study was 
completed using many of the items from the 
earlier ballots and using the same fixed-ad- 
dress sample. Results from this study were 
called “occupancy-stage” data and represented 
opinions anchored both to the physical re- 
ality of the housing and to the human and 
social reality of the occupants as neighbors 
in the community area. 

Results showed that: 


1. For the total samples, and for nearly all 
questions, a definite trend toward more fa- 
vorable opinions toward the project and its 
occupants was discerned from the planning 
to the construction to the occupancy stages. 

2. On a “core question” asking directly 
about approval or disapproval of the project 
a steady increase in “favor” and a decrease 
in “oppose” responses was noted. This trend 
cut across all income classes, but was most 
pronounced for the lower income groups. 


Low-Rent Housing Project 211 

3. Economic-centered fears that the project 
would lower property values were largely dis- 
pelled over the four-year period. Few re- 
spondents believed that their taxes had gone 
up because of the project. 

4. Responses to questions concerning the 
effect of the project on the attractiveness of 
the neighborhood and its pleasantness as a 
place to live show little trend to any definite 
view over the years. It seems that the physi- 
cal and social “anchors” for attitudes toward 
the housing development since 1950 have, if 
anything, affected responses to these questions 
in a direction of “no change.” 

5. Responses to person-centered questions 
inquiring whether the project has brought un- 
desirable people into the neighborhood and 
whether project tenants are like or unlike 
people in the rest of the neighborhood show 
slight tendencies toward more favorable views 
toward the tenants. There has been no sig- 
nificant change in these questions since ten- 
ants have moved into the project. 

6. Many fewer persons in the latest study 
say the project is affecting their long-term 
plans to remain as residents in the neighbor- 
hood. Analysis of the earlier responses of 
persons who actually moved out of the neigh- 
borhood during the period of occupancy of 
the project indicates no significant difference 
on this intent-to-remain-in-neighborhood ques- 
tion between this group of “movers” and the 
rest of the sample who stayed. 

7. A group of physical nuisance items on 
neighborhood noise levels, traffic nuisances, 
school crowding, and shopping difficulty show 
strong shifts to more favorable responses dur- 
ing the period of actual occupancy of the 
project when an experience-based anchor for 
opinions was present. 

8. Two information questions about the 
project which required rather specific knowl- 
edge to answer correctly showed considerable 
decline in proportion of correct response from 
1950 and 1952 to 1954. One more general 
information question was answered correctly 
by more persons in 1954. These data sug- 
gest -that as the project became less of a 
planning-stage “issue” in the neighborhood 
and was instead an accomplished fact, de- 





212 


tailed information level about it among neigh- 
borhood residents declined. 

9. Panel-type analysis of the opinions of 
identical individuals who were respondents in 
1950 and again in 1954 demonstrated that 
such individuals shifted in the same fashion 
as indicated by total-sample results. For these 
people, there was much greater stability on a 
“favor” response on the “core question” than 
on “oppose.” Over the years many of those 
who shifted from “oppose” shifted all the 
way to “favor” on this question. Decided 
shifts in a favorable direction were noted on 
property-value expectations, on neighborhood 
pleasantness, and on physical nuisance items 
involving the project. Same-individual data 
from 1952 to 1954 corroborated most of these 
findings. 

10. Panel-type analysis of respondents from 
same families interviewed in previous studies, 
but not the same individuals previously inter- 
viewed, showed that data from these same- 
family respondents departed on several ques- 
tions from the total-sample trend and from 
same-individual patterns. This suggests that 
a goodly amount of within-household opinion 
variance on the housing project exists. 

11. Of incidental interest is that for the 
third time, 96 per cent of assigned fixed ad- 


Kenneth E. Clark and Robert L. Jones 


dress households yielded completed inter- 
views. The number of refusals in all years 


was under 2 per cent, thus demonstrating the 
effectiveness of a highly-trained interviewing 
staff and a persistent call-back field plan. 


Received October 19, 1955. 


References 


. Clark, K. E., & Swanson, C. E. Neighborhood 
reaction to public low-rent housing. J. appl. 
Psychol., 1951, 35, 342-347. 

. Clark, K. E., & Swanson, C. E. Attitudes toward 
public low-rent housing, before and after con- 
struction. J. appl. Psychol., 1953, 37, 201- 
206. 

. Festinger, L., & Kelley, H. H. Changing atti- 
tudes through social contact. Ann Arbor: 
Research Center for Group Dynamics, Univer. 
of Michigan, 1951. 

. Housing Authority of Baltimore City. A study 
of the attitudes of potential applicants toward 
public housing. Baltimore: Housing Author- 
ity of Baltimore City, 1954. 

. Merton, R. K., West, Patricia S., Jahoda, Marie, 
& Selvin, H. C. Social policy and social re- 
search in housing. J. soc. Issues, 1951, 7, 
Nos. 1-2. 

. Wilner, D. M., Walkley, Rosabelle, & Cook, S. W 
Residential proximity and inter-group rela- 
tions in public housing projects. J. soc. 
Issues, 1952, 8, 45-69. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Validity and Factor Analyses of Naval Air Training 
Predictor and Criterion Measures 


John T. Bair, Robert F. Lockman,’ 
and Charles T. Martoccia * 


U.S. Naval School of Aviation Medicine 


World War II pilot candidate selection re- 
search resulted in test batteries that were ef- 
fective in predicting flight training success in 
the Navy and Air Force (5, 6). Some valid 
Air Force predictors, however, were not in- 
cluded in the Navy selection battery, particu- 
larly tests of spatial and perceptual abilities 
(7). 

A recent Navy-Air Force joint research 
project, the Pilot Candidate Selection Re- 
search Program (PCSRP), involved the ad- 
ministration of 69 experimental tests to a 
population of 2,126 Navy midshipmen about 
to enter flight training (11). Validity data 
from this project indicated that spatial and 
perceptual ability tests yielded some of the 
highest correlations with a pass-fail flight 
training criterion (12). On the basis of this 
evidence the Spatial Apperception Test was 
added to the Navy Flight Aptitude Rating 
selection battery (1). This test was pat- 
terned after the Air Force Aerial Orientation 
test which correlated .34 with graduation or 
elimination for academic reasons, and .31 with 
completion or elimination for flight proficiency 
reasons in the PCSRP study. Preliminary 
validity evidence warranted retention of the 
Navy test in the selection battery, but more 
extensive investigation of the relations of 
spatial and perceptual abilities with flight 
training criteria is needed. This was the ma- 
jor purpose of the present study. The spe- 
cific objectives were: (a) to investigate cer- 
tain spatial and perceptual tests together with 
other measures of differential abilities in re- 
lation to both academic and flight training 
grades in the Naval Air Training Program, 
(6) to determine more information on the fac- 
torial structure of these spatial and perceptual 


1 Now at the Bureau of Naval Personnel, Wash- 
ington, D. C. 

2 Opinions or conclusions herein are those of the 
authors and do not necessarily reflect the views or 
possess the endorsement of the Navy Department. 


abilities, and (c) to provide data to aid con- 
struction of more reliable and valid measures 
of spatial, perceptual, and other differential 
abilities in relation to success in the Naval 
Air Training Program. 


Procedure 


A battery of seven standardized spatial and per- 
ceptual ability tests was administered to a group of 
125 naval aviation cadets in D Stage of basic flight 
training during the fall of 1952. This was the in- 
strument and radio instruction stage which in 1952 
followed primary flight training. To obtain cri- 
terion data, the sample was followed through to the 
completion of training. The sample consisted of 108 
cadets who completed training and were designated 
naval aviators. Seventeen cadets were excluded from 
the original sample because of attrition or incom- 
plete records. 

Scores on six other differential ability tests were 
included as variables. Four of these were adminis- 
tered during pre-flight school; the other two had 
been administered in the initial cadet selection bat- 
tery. 

Three pre-flight school academic grades and four 
flight-training grades were included as criterion vari- 
ables. A description of the predictor and criterion 
variables follows: 


Tests Administered During D Stage 


1. Revised Minnesota Paper Form Board (Series 
MA): requires the selection of an appropriate as- 
sembled two-dimensional geometric figure after men- 
tal manipulation of the unassembled parts. A reli- 
ability of .92 has been reported (13). 

2. DAT Space Relations (Form A): measures an 
ability to visualize a constructed object from a pat- 
tern and how the object would appear if rotated in 
various ways. An additional feature of this test is 
the mental manipulation of objects in three-dimen- 
sional space. A mean reliability of .93 has been re- 
ported (2). 

3. Guilford-Zimmerman Spatial Orientation (Form 
A): measures an ability to evaluate the spatial po- 
sition of objects with reference to the human body. 
It requires awareness of whether or not one object 
is to the right or left, higher or lower, and nearer or 
farther from another object. A reliability of .88 has 
been reported (9). 

4. DAT Clerical Speed and Accuracy (Form A): 
measures speed of response in simple perceptual tasks 


213 





214 


requiring the selection of a proper number or letter 
combination from a series of other combinations. A 
mean reliability of .87 has been reported (2). 

5. Minnesota Clerical-Number Comparison (Part 
1 of the Minnesota Clerical Test): requires quick and 
accurate comparisons of number combinations for 
similarities or differences. A reliability of .76 has 
been reported (13). 

6. Minnesota Clerical-Name Comparison (Part 2 
of the Minnesota Clerical Test): requires the rapid 
and accurate comparison of proper name combina- 
tions for similarities or differences. A reliability of 
.83 has been reported (13). 

7. Topological Orientation Test: measures orienta- 
tion to specific geographic points. Using a compass 
rose as a reference and Chicago and Pensacola as 
points of origin, the examinee indicates the direction 
to 10 cities in the United States and 10 foreign cities 
by the shortest possible route without the aid of a 
map. The test score is an error score derived by 
summing an individual’s deviations from the initial 
great circle headings for each city. A reliability of 
.86 has been reported (3). 


Tests Administered in the U. S. Naval School, Pre- 
Flight 


8. ACE Psychological Examination-L (1947 edi- 
tion): includes sentence completion items, artificial 
language, and vocabulary same-opposites. A reli- 
ability of .95 has been reported for this language 
section of the 1938 edition (13). 

9. ACE Psychological Examination-Q (1947 edi- 
tion): includes arithmetic reasoning, figure analogies, 
and number-series items. A reliability of .87 has 
been reported for this quantitative section of the 
1938 edition (13). 

10. GED Correctness and Effectiveness of Expres- 
sion (College Form A): measures the understanding 
of certain skills in English usage, particularly gram- 
mar and spelling, at the college freshman level. No 
reliabilities have been reported. 

11. Essentials of Mathematics: measures certain 
elementary mathematical skills. In Part I the items 
measure proficiency in addition, subtraction, multi- 
plication, and division of whole numbers, fractions, 
and decimals. Part II covers general reasoning prob- 
lems requiring high school algebra and geometry. 
This test was developed at the U. S. Naval School, 
Pre-Flight, and no reliabilities have been reported. 


Tests Administered as Part of the Cadet Selection 
Battery 


12. Aviation Classification Test (Forms 3 and 
4): 3 measures general academic intelligence and in- 
cludes sections on vocabulary; meter and dial read- 
ing; judgment; mathematics; number, name, and 
symbol comparisons. A reliability of .92 has been 
reported (5). 

13. Mechanical Comprehension Test (Forms 3 and 
4): requires comprehension of the nature, operation, 
and effects of various physical principles rather than 


8 This test has recently been replaced by a newer 
form called the Aviation Qualification Test (4). 


John T. Bair, Robert F. Lockman, and Charles T. Martoccia 


knowledge of specific tools or equipment. A reli- 


ability of .87 has been reported (5). 


Criterion Variables: U. S. Naval School, Pre-Flight 
Grades 


14. Final Navigation Grade: includes the average 
of weekly quizzes and a final examination for nine 
weeks of dead-reckoning navigation and five weeks 
of celestial navigation. The quizzes and the ex- 
amination each are weighted 50 per cent in the final 
grade, which is converted to standard score form 
(as are all other grades in pre-flight school). 

15. Final Engines Grade: includes the average of 
weekly quizzes and a final examination, each weighted 
50 per cent. The course involves basic understand- 
ing of aircraft engines and their operation. 

16. Ground School Final Grade: computed at the 
end of pre-flight school as a weighted average of 
final grades in Navigation, Naval Orientation, En- 
gines, Aerology, Principles of Flight, Physical Train- 
ing, Military, and Study Skills. This grade is com- 
puted at the end of pre-flight school. 


Criterion Variables: Flight Training Grades 


17. Final K Stage (Basic) Grade: includes 10 in- 
structional and two check flights in field carrier 
landing practice. On each flight, the student is given 
a mark by his instructor of AA (above average), 
A (average), BA (below average), or U (unsatis- 
factory) on each of several maneuvers and pro- 
cedures. Letter grades are accumulated for the 12 
flights. Then a numerical grade is determined by 
multiplying each AA by four, each A by three, each 
BA by two, and each U by one. These weighted 
scores are summed and divided by the total number 
of letter grades, and converted to standard score 
form to obtain the final K stage grade. 

18. Final L Stage (Basic) Grade: includes six air- 
craft-carrier landings and is determined in a manner 
similar to K stage described above. 

19. Final Basic Flight Grade: includes all AA, A, 
BA, and U letter grades accumulated for 104 in- 
structional and check flights covering the 11 stages 
of basic flight training.4 These stages (in addition 
to K and L described above) were A—primary solo, 
B—precision, C—acrobatics, D—instruments and 
radio, E—night, F—formation, G—gunnery, H— 
primary combat, and I—cross-country navigation. 
The accumulated letter grades for these stages were 
weighted, averaged, and converted to standard score 
form in the manner described above. 

20. Final Advanced Flight Grade: includes all ad- 
vanced flight training grades which are computed in 
same way as final basic flight grades. Advanced 
training stages are, in general, extensions of basic 
training stages, although varying with the type of 
aircraft in which the cadet specializes. 

Means, standard deviations, and product-moment 
correlation coefficients were computed for these 20 


variables. Table 1 presents the means and standard 


4A cadet is given extra flights if a regular flight is 
graded incomplete or unsatisfactory. These extra 
flights, however, are not included in the final grade. 





Naval Air Training Predictor and Criterion Measures 


Table 1 


Means and Standard Deviations of Predictor 
and Criterion Variables * 








Mean SD 


51.31 7.21 
75.38 13.86 
31.59 10.30 
63.10 11.21 
. Minn. Clerical, Numbers 117.43 23.18 
. Minn. Clerical, Names 123.44 27.98 
TO 85.57 26.64 
. ACE-L 49.72 11.28 
. ACE-Q 44.88 10.21 
. English 48.60 10.88 
. Math 40.94 13.87 
. MCT 85.77 10.75 
. MCT 60.10 6.51 


Predictor 
. MPFB 
. DAT-SR 
GZ-SO 
. DAT-CS&A 





Mean SD 


49.49 7.16 
49.39 7.78 
50.36 5.63 
2.95 .08 
2.87 21 
2.99 07 
3.05 06 


Criterion 
. Navigation grade (PF) 
. Engines grade (PF) 
. Pre-flight ground final grade 
. K-Stage grade (Basic) 
. L-Stage grade (Basic) 
. Final basic flight grade 
. Final advanced flight grade 





* The predictor and criterion variables are listed in the same 
order as in the body of the text. 


deviations and Table 2 the correlation (and residual) 
matrix. 


Analyses and Results 
Validity Data 


The variables in Table 2, the correlation 
matrix, can be classified into two general 
categories: (a) training grades or proficiency 
criteria, and (b) psychological measures of 
aptitudes and abilities potentially predictive 
of training grades.° No significant correla- 
tions were found with K or L Stage grades. 
All other criterion variables correlated signifi- 
cantly with three or more predictor variables. 

The classification of variables as predictors 
or criteria facilitates analysis for the “best” 
combinations of multiple validity. Jenkins’ 
improved short-cut method for multiple cor- 
relation (10) was used for this purpose, and 
the results are given in Table 3. For each 
criterion, the multiple correlation coefficient 
with MCT and ACT scores is also presented 
to contrast the validity of these standard 


5 Significant validity coefficients are shown in bold- 
face type in Table 2. 


215 


cadet selection tests with that of the best 
combination of all predictor measures. 

The batteries of experimental predictor 
measures so derived resulted in significantly 
greater multiple validities for all grade cri- 
teria, except final basic flight grade, than 
those obtained with ACT and MCT. Seven 
predictors in varying numbers and combina- 
tions accounted for the multiple correlations 
achieved: Essentials of Mathematics, GED 
Correctness and Effectiveness of Expression, 
Minnesota Paper Form Board, Navy MCT, 
Minnesota Clerical (Names and Numbers), 
ACE-L, and Topographical Orientation. 

It is interesting to note that none of the 
predictor variables related significantly to K 
and L Stage grades and the correlation be- 
tween these two stage grades was .28. This 
seems to be an unexpectedly low relationship 
for grades given the same individuals in two 
successive training stages presumed to be 
highly related, that is, field carrier landing 
practice and actual carrier landings. 


Factor Analysis 


Four significant factors were extracted from 
the original correlation matrix using Thur- 
stone’s centroid method (14). The unrotated 
factor matrix is presented in Table 4. The 
residuals after extraction of the fourth factor 
are included in Table 2. There were but two 
significant residuals remaining at this point, 
meeting Guilford’s criterion for a stopping 
place in factoring (8). 

The four factors were rotated orthogo- 
nally ® into a satisfactory approximation of 
a simple structure using Zimmerman’s graphi- 
cal method (15). Factor loadings after four 
rotations are given in Table 5. A loading of 
40 or greater is considered significant. All 
four factors were overdetermined by Thur- 
stone’s criterion (14). 

It is noted that criterion variables were in- 
cluded in the factorization. They have face 
validity in that they represent extensions of 
the predictor variables into everyday training 
situations. In addition, they aid in factor 
interpretation. 

Variable 5 has the highest loading on Fac- 


® The authors are indebted to LTJG H. Paul Kel- 
ley, MSC, USNR for an oblique solution which re- 
sulted in a verification of the factors derived or- 
thogonally. 








“paqzWo JuJOd ;eUIaP ay PUL sadefd OM] 07 INOJ WOIJ PINpad aJaM STENpPIsas PUL SPUIOYJIO UOTPL[IIOD » 


sjenpisoy 1039" YWNOY 


coO— ¢0- 2-— 2- 80 £0- ZO WYSIp psouvape [eury 
10 t0—- 00 ¢€0-— - 80- 90 WSIP seq [euly 
9- 10- 2 ZI- W- LO- 80 (oIseg) a3eys ‘J 
cO—- ¢£0-— 90 z0 10— 00 z0 (oIseg) a3eys y 
40 00 9- 9- 00 00 Lo- Ad [euyg punoiy 
co 8006S0.ClCHOCSOsd 00 Ad seusug 

¢0 zO— 00 F0O— 00 so- Ad “wourwatNn 
I¢ 90—- 90- 00 40 90- LOW 
rs =sgt co 86©6F0- =—0 z0 LOV 
9 86Ll¢ 40 gt +0 wie 
6b It 40 90 t0-— ysyjsuq 
Ig #1 s0-— 10 so- O-dOV 
te 867 9¢ tO 8636pt0—--—s S80 so— TAOV 
i- se-— 90 W—- OL 
Lr 10 60- 40 10 SoweN “Pwo “UU 
hy 6F trO-— 99 00 "SON ‘[ROWATD “UUTY 
02 00 9¢ 60 V¥SO-LVa 
ce re Sz I¢ £0- OS-ZD 
I¢ se 8t— SI 1Z 00 US-LVa 
62 o¢ zw- lz 9€ aqadWw 


al I 6 l 9 ¢ sal 





_ SOL [eNpisay puv uoryryaso0y 


Z AGeL 


3 
™ 
LS 
S 
Ss 
~ 
~ 
S 
~ 
= 
NY 
“ 
© 
~ 
x 
S 
Pa 
Y 
s 
= 
S 
e 
7 
= 
“= 
$ 
~ 
Ry 
~ 
x 
L 
S 
Ss 
Re 
> 
™_= 
S 
% 
NY 
= 
= 
S) 
~ 





Naval Air Training Predictor and Criterion Measures 


Table 3 
Multiple Validity Data * 


Grade Criterion 


"Navigation 
Engines 
Pre-flight ground final 
Final basic flight 


Final advanced flight 


* MPFB 
MCT 


English 





Predictors 


2 


English —_ 
MCT 


MC-Names 


Math 
ACT ~~ 


ACE-L 


MCT 
MCT — 


MC-Names 


MCT 
ACT 


MPFB 


MCT 


* In order of contribution to explained variance. 


tor I and requires the accurate and rapid per- 
ception of similarities and differences between 
two sets of numbers. Variable 4 has the next 
highest loading and involves speed and ac- 
curacy in comparing number and letter com- 
binations; Variable 1 requires speed of per- 
ception of two-dimensional spatial objects and 
their mental manipulation. Variable 13 re- 
quires an ability to perceive the operation of 


common mechanical tools and items and to 
determine various physical principles from 
these operations. Factor I, then, can be de- 
scribed as perceptual analysis with speed of 
visualization of numbers, letters, and two di- 
mensional objects playing a major role and 
verbal comprehension assuming a negligible 
role. This factor is unique in that there 
were no significant loadings on it for any of 


Table 4 





The Unrotated Factor Matrix * 








Variable Number and 
Description 


oo 


. Minn. Paper Form Board 
. DAT Space Relations 
3. G-Z Spatial Orientation 
. DAT Clerical 
. Minn. Clerical—Numbers 
. Minn. Clerical—Names 
. Topographical Orientation 
. ACE Psych. Exam.—L 
. ACE Psych. Exam.—Q 
. GED English 
. Mathematics 
. Aviation Classification 
. Mechanical Comprehension 
. Navigation PF 
. Engines PF 
. Pre-flight ground final 
. K Stage (Basic) 
. L Stage (Basic) 
. Final basic flight 
. Final advanced flight 


on 


nwo Ut 
Nm 


| 
t Baad Or 
FELZZASSSRRS 


III 


—22 
—07 
—15 
—22 
—50 

41 
—16 

20 


IV 
21 
40 
30 


Communality 


40 
44 
52 
40 
67 
63 





* Decimal points omitted. 





John T. Bair, Robert F. Lockman, and Charles 


T. Martoccia 


Table 5 





The Rotated Factor Matrix * 





Perceptual 
Variable Number and cineansenastatn 
Factor I 


Description 


. Minn. Paper Form Board 60 
. DAT Space Relations 06 
. G-Z Spatial Orientation 28 
DAT Clerical 64 
. Minn. Clerical—Numbers 77 
. Minn. Clerical—Names 23 
. Topographical Orientation 10 
. ACE Psych. Exam.—L —03 
. ACE Psych. Exam.—Q 17 
. GED English —10 
. Mathematics 14 
. Aviation Classification 07 
. Mechanical Comprehension 57 
. Navigation 25 
. Engines 28 
. Final ground school 25 
. K Stage —04 
. L Stage 00 
. Final basic flight 17 
. Final advanced flight 26 


CNAME wD | 


Academic Comprehension Applied 
Potential of Relationships Spatial 


Factor II 


Factor III Factor IV Communality 


13 10 21 43 
18 10 44 44 
40 07 53 52 
—03 06 11 43 
32 —01 03 69 
12 76 06 65 
— 36 —32 —15 26 
54 58 02 63 
31 61 16 52 
60 41 15 
65 ; 11 
51 
29 
73 
65 
86 
—25 
—14 
—29 
02 


Factor Intercorrelations 
(Rank-Order Method) 


I-II 
I-III 
I-IV 
II-III 
II-IV 
III-IV 


00 
—.34 
—.12 

30 
—.27 
— .09 


* After four rotations final communality values differ slightly from the unrotated values due to the graphical method of rota 


tion and rounding errors. 


the criterion variables. It accounts for 11 
per cent of the total variance. 

Variables 16 and 14 have the highest load- 
ings on Factor II. These are academic train- 
ing grades: final pre-flight school grade and 
final pre-flight navigation grade, respectively. 
Variables 11, 10, 8, and 12 deal with aca- 
demic ability involving the diagnosis of prob- 
lems and the development of rules and prin- 
ciples from a set of objects. Variable 15 is 
another academic training grade, final pre- 
flight engines grade. The remaining vari- 
ables, 2 and 3, require an ability to educe 
spatial orientation concepts useful in tech- 
nical academic courses such as navigation. 
Factor II can be identified as an academic 
potential factor particularly applicable to 
technical academic work. It accounts for 20 
per cent of the total variance. 


Again, decimal points have been omitted. 


Variable 6 has the highest loading on Fac- 
tor III. It involves rapid and accurate dis- 
crimination between two sets of proper names. 
Variables 12, 9, and 8 have the next highest 
loadings, and all require the comprehension of 
concepts and their application to new situa- 
tions. Variable 19 is the total basic flight 
training grade and could be considered the ap- 
plication of principles and procedures learned 
in ground school and in flight training. Vari- 
able 10 measures the understanding of cor- 
rect language usage rather than factual knowl- 
edge. Factor III could be described as 
comprehension of relationships, particularly 
as related to the understanding of oral and 
written instructions and the application of 
rules and principles to actual flight situa- 
tions. Whereas Factor II seems to involve 
more of an inductive reasoning process, Fac- 





Naval Air Training Predictor and Criterion Measures 


tor III seems to be primarily deductive in 
nature. Factor III accounts for 14 per cent 
of the total variance. 

Variable 17, final flight training grade in 
field carrier landing practice, has the highest 
loading on Factor IV. It involves an ability 
to estimate the position of a field landing 
area and other objects in relation to different 
positions of the airplane. Variable 19 in- 
cludes these same elements for all stages of 
basic flight training. Variable 3 requires the 
ability to visualize other objects in relation 
to body position, and Variable 2 involves the 
visualization and mental manipulation of 
three-dimensional spatial objects. It appears 
that Factor IV is an applied spatial relations 
factor dealing primarily with the relationship 
of objects in three dimensions. It accounts 
for 7 per cent of the total variance. 

Although Table 5 indicates fairly low fac- 
tor intercorrelations, there is a logical de- 
pendence among three of them. Factor II 
includes the development and learning of 
principles required for flight training essen- 
tially on a didactic level. Factor III in- 
cludes the ability to apply these principles to 
over-all flight training in a general manner, 


and Factor IV involves the discrete ability to 
apply necessary concepts of spatial relations 
to specific flying situations. 


Summary and Conclusions 


The chief purpos= of this investigation was 
to interrelate certain tests of spatial and per- 
ceptual abilities, tests of other differential 
abilities, and proficiency measures in the 
Naval Air Training Program. It was found 
that: 


1. The most substantial relationships ex- 
isted between tests of academic aptitude and 
grades in the pre-flight phase of training. 

2. Tests of spatial and perceptual abilities 
correlated highest with final basic and ad- 
vanced flight grades. 

3. Four significant factors derived by fac- 
tor analysis were: perceptual, academic po- 
tential, comprehension of relationships, and 
applied spatial relations. 

4. Although the inclusion of criterion vari- 
ables did not reveal any new factors, it did 
aid considerably in defining those factors 
found. 

5. Since only 51 per cent of the total vari- 


219 


ance was accounted for by the four factors 
described, there may well be other factors 
that would account for some of the variables 
employed. It is also possible that some of 
these variables would cluster with factors 
still unidentified. 


Received September 22, 1955. 


References 


. Ambler, Rosalie K. Preliminary evaluation of 
two forms of the Spatial Apperception Test. 
U. S. Naval Sch. Aviat. Med., 1953, Project 
NM 001 057.04.04. 

. Bennett, G. K., Seashore, H. G., & Wesman, 
A. G. Manual for the Differential Aptitude 
Tests. New York: Psychological Corp., 1952. 

. Clark, W. B., & Malone, R. D. The relationship 
of topographical orientation to other psycho- 
logical factors in naval aviation cadets. U. S. 
Naval Sch. Aviat. Med., 1952, Project NM 001 
059.01.32. 

. Davis, F. B. Development of the Aviation 
Qualification Test (Forms 5 and 6). 1953. 
(Contract Nonr 758[008].) 

. Fiske, D. W. Validation of naval aviation cadet 
selection tests. J. appl. Psychol., 1947, 31, 
601-614. 

. Flanagan, J. C., et al. (Eds.) AAF Aviation 
Psychology Program. Washington: U.S. Gov- 
ernment Printing Office, 1947. (AAF Aviat. 
Psychol. Program Res. Rep. Nos. 1-19.) 

. Guilford, J. P. (Ed.) Printed classification 
tests. Washington: U. S. Government Print- 
ing Office, 1947. (AAF Aviat. Psychol. Pro- 
gram Res. Rep. No. 5.) 

. Guilford, J. P. Psychometric methods. 
York: McGraw-Hill, 1936. 

. Guilford, J. P.. & Zimmerman, W. S. A manual 
for the Guilford-Zimmerman Aptitude Sur- 
vey. Beverly Hills, Calif.: Sheridan Supply 
Co., 1947. 

. Jenkins, W. L. An improved method for multi- 
ple R. Educ. psychol. Measmt, Summer, 1952, 
12, 316-322. 

. Page, H. E. The pilot candidate selection re- 
search program: historical background and 
organization. USAF Sch. Aviat. Med.. Proj. 
Rep., 1950, Proj. No. 21-29-008. (Rep. No. 
NM 001 057.04.01.) 

. Payne, R. B., Rohles, F. H., & Cobb, B. B. The 
pilot candidate selection research program: 
test validation and intercorrelations. USAF 
Sch. Aviat. Med. Proj. Rep., 1952, Proj. No. 
21-29-008; BuMed Project No. NM 001 057. 

. Super, D. E. Appraising vocational fitness. 
York: Harper, 1949. 

. Thurstone, L. L. Multiple-factor analysis. Chi- 
cago: Univer. of Chicago Press, 1947. 

. Zimmerman, W. S. A simple graphical method 
for orthogonal rotation of axes. Psycho- 
metrika, 1947, 11, 51-55. 


New 


New 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Dimensional Analysis of Motion: X. Experimental 
Evaluation of a Time-Study Problem * 


Donald Hecker, Donovan Green, and Karl U. Smith 


The University of Wisconsin 


Time-study technique represents a method 
of measurement of human performance in the 
industrial task from which quantitative stand- 
ards of work output may be derived. In the 
direct application of time-study technique, the 
engineer breaks up a task into its elements, 
called therbligs, and times these elements 
separately. Standards of output are derived 
from these measures by eliminating or cor- 
recting for those elements and conditions of 
the task that do not represent the “critical” 
factors and conditions defining the actual mo- 
tion generally to be performed. 

The most widespread use of time-study 
technique involves the direct method just de- 
scribed. Increasingly, however, industry is 
making use of predetermined time standards, 
i.e., a set of values derived from timing a re- 
stricted area of work, which are generally ap- 
plied to other industrial tasks. The use of 
such restricted time measurements and stand- 
ards has stimulated interest in the scientific 
validity of all time-study concepts and prac- 
tices. 

The scientific investigation of time-and- 
motion study may take several directions. 
One approach has involved the study of the 
accuracy of the rating methods used in cor- 
recting or leveling obtained times in order to 
establish a standard of performance (1, 3). 
Another line of attack involves evaluation of 
the methods of measurement and the validity 
of their application. 

Several studies (2, 4, 5, 6, 7, 8) have been 
conducted thus far which show that there are 
grave limitations in the concept of independ- 
ent movement elements, to which a standard 
of fixed time can be assigned as a result of 
measurement. The work thus far conducted 
along this line has involved observation of 
the differential effects of learning, of fatigue, 

1 This research has been supported by funds from 


the Graduate School Research Committee, the Uni- 
versity of Wisconsin. 


of perception, and of the distance of move- 
ment upon the separate therbligs making up 
a motion. 

The present study attempts to investigate 
directly a basic problem of present time- 
study methods and of the application of pre- 
determined time standards in industry. To 
what extent are the component movements or 
therbligs in motion interdependent? Will the 
changing of the nature of one movement affect 
the duration of an adjacent motion in the 
task? Specifically, as this study deals with 
the problem, will different types of manipula- 
tive movements bring about a variation in the 
duration of a travel movement of a fixed 
length? 

There are only very limited prior observa- 
tions on the problems of interaction of com- 
ponent movements in motion. Limitations in 
methods of motion analysis hitherto used 
have restricted investigation of such prob- 
lems. One of the main aspects of the present 
research has been to overcome such limita- 
tions in experimental methodology by de- 
velopment and application of special elec- 
tronic techniques of motion study. 


Methods 


As just noted, special electronic methods of mo- 
tion analysis have been devised to conduct this in- 
vestigation. In addition to these methods, a pre- 
planned work situation is used to control and vary 
the perceptual and reactive characteristics of the 
performance task. 


Apparatus 


Figure 1 presents a photograph of the apparatus 
used in this experiment. The work panel, contain- 
ing in this case two vertical rows of four push but- 
tons each, is shown to the right in the figure. The 
electronic motion analyzer is housed in the small 
switching unit, shown to the far left. The two 
electronic interval timers located on the back of 
this table are used to measure separately the travel 
and manipulation aspects of motion. 

During an experimental observation on this par- 
ticular setup, the subject operator (S) stands before 


220 





Dimensional Analysis of Motion: 





Fic. 1. 
nipulation boards in place 
on the table to the back. 


the large work panel and, on instruction, pushes 
with his preferred hand the push-button switches 
located on the work panel. He starts with the top 
switch on the left, crosses to the top one on the 
right, back to the second one on the left, and so on 
until all switches are pushed. When S makes con- 
tact with the first switch, he automatically activates 
one of the precision interval timers, the one used 
to measure the duration of manipulation. As long as 
he stays in contact with this first switch, this ma- 
nipulation timer runs. As soon as S breaks contact 
with the first switch, to move to the second, the 
manipulation clock stops and the second, the travel 
movement timer, starts to run. This timer continues 
until the second push button is touched, when it 
stops and the manipulation timer is turned on for a 
second time. As S presses the buttons successively, 
the two clocks automatically record the duration of 
each manipulation and travel movement in the task 
When the last push button switch is operated, the 
two clocks stop automatically. 

The electronic motion analyzer can be thought of 
as consisting of two circuits, Sm and St. One of 
these circuits is an open circuit, Sm, which is closed 
by S when he touches any of the knobs or switches 
on the work panel. The other circuit, St, is closed 
when this first circuit is open. 

Figure 2 illustrates diagrammatically the circuits of 
the motion analyzer. The S is shown as the circle 
at the bottom of the diagram. The dotted lines rep- 
resent the sweep of his arm from the manipulation 
board on the left to the manipulation board on 
the right. Four identical manipulation devices are 
mounted on each of the removable boards on the 
panel. Each of these devices is connected in com- 
mon to the left side of a double switching circuit, 
Sm. A metallic rod which the operator holds in his 
nonpreferred hand is also connected with this side 
of the switching relay. When the operator manipu- 
lates a knob on the work panel, the subject circuit, 














Experimental set-up showing the work panel to the right with the two ma- 
The motion analyzer and electronic interval timers are shown 


Sm, is closed and the second circuit, St, automati- 
cally opens. This closing of the subject circuit starts 
the precision timer, Mt, which measures the time of 
contact or manipulation time. When S breaks con- 
tact with the first knob and travels toward the next, 
the subject circuit is reopened, and the other circuit, 
St, is closed, activating the second timer, Tt. On 
successive manipulation and travel movements, the 
durations of these component movements are ac- 
cumulated on the timers. Touching the lower knob 
or switch on the right stops all recording. 

Several features of this high-precision electronic 
motion analyzer should be noted. The S is not 
stimulated by the current passing through his body 








My Ty 



























































SUBJECT 





Fic. 2. Diagram of the circuit relations involved in 
the electronic method of motion analysis. 





222 


inasmuch as it is at subthreshold level. Only 
vacuum-tube relays are used in this special design 
of the analyzer circuit. Accordingly its precision is 
limited only by the rapidity of the emission charac- 
teristics of the tubes. Finally, electronic interval 
timers are employed that provide time registration 
to an accuracy approaching 1/100,000th of a second. 
Because of calibration differences between interval 
timers, we estimate that readings to an absolute ac- 
curacy of .001 seconds are obtained. 


Procedure 


Eight different types of manipulation were studied 
in this experiment: (a) a clockwise turning of the 
hand, used to operate a knob-type turn switch; (b) 
a vertical switching movement, used to throw a 
toggle switch downward; (c) a pushing movement 
of the thumb, needed to press the push button 
switches shown in Fig. 1; (d) a pulling movement 
on a small latch device, carried out by grasping the 
latch between thumb and forefinger; (e) a dial- 
setting motion, requiring rotation of a 2-inch dial 
arm on a marked dial face; (f) a thumb-forefinger 
pressure movement, made by squeezing a latch de- 
vice; (g) a lateral switching motion to the right 
hand, using a toggle-type switch; and (A) a counter- 
clockwise turn movement, using the same type of 
turn switch as in a. 

The different devices used to obtain these eight 
different kinds of manipulations were mounted on 
boards as already described in connection with Fig. 
1. These manipulation boards could be quickly 
mounted on and removed from the work panel so 
that the type of manipulation performed could be 
changed from trial to trial in an experimental ses- 
sion. All the latches, switches, and knobs used in 
the experiment could be moved easily, involving ma- 
nipulation movements of 0.25 to .5 inches. The hori- 
zontal distances between switches or knobs on the 
two sides of the panel were kept constant at 24 
inches. 

This study was divided into two separate experi- 
ments, the first consisting of observations based on 
the first four of the eight types of manipulations 
listed above, and the second involving observations 
based on the last four. The procedure for each of 
the two experiments was identical. Male and female 
college students from elementary and intermediate 
courses in psychology were used as subjects. In per- 
forming a given type of manipulation, S stood be- 
fore the work panel operating each of the identical 
manipulation devices on the two sides of the panel 
alternately, beginning with the uppermost device on 
the left board and ending with the lowest on the 
right board. A total of eight manipulation and 
seven travel movements occurred, therefore, in any 
single trial. Since touching the last switch stops the 
clocks, only seven manipulation movements were re- 
corded. 

In a given experimental period Ss were given three 
trials on each of the four types of manipulation 
used. Each experiment was run for four successive 
days. The four conditions of manipulation in each 


Donald Hecker, Donovan Green, and Karl U. Smith 


experiment, which are the main experimental vari- 
ables, give 24 possible sequences of observation. In 
order to eliminate sequence effects, 24 Ss were as- 
signed randomly to the 24 possible sequences. Ac- 
cordingly, all possible sequences of the four condi- 
tions of manipulation were used in each experiment, 
and one S was assigned to each sequence. On each 
day, then, each S was given three consecutive trials 
on each condition of manipulation in turn, giving 12 
trials in all. 

The recorded data of this study are the separate 
manipulation and travel times for each trial. The 
median scores for each component movement, based 
on the three trials for each condition of manipula- 
tion, were used in the analysis of the data. Separate 
analyses of variance were carried out for the travel- 
time scores in the two experiments. 


Results ” 


The main results of this study deal with 
the question, will the variation in the type of 
manipulation produce significant differences in 
the duration of a travel movement of con- 
stant length? The findings will be discussed 
in terms of two separate experiments, each 
dealing with four different types of manipu- 
lation. For each experiment, data will be 
presented separately for levels of skilled per- 
formance and for performance during learn- 
ing. In addition, data concerning the meas- 
ures of manipulation movements themselves 
will be mentioned. 


Interaction of Component Movements in 
Skilled Performance 


Figure 3 summarizes the differences in the 
duration of a travel movement of the arm, 
24 inches in length, that occurred in relation 
to the eight different types of manipulation 
used in the experiments. Figure 3A gives 
the data for the first experiment and 3B for 
the second. The bar graphs represent the 
mean duration of the travel movement for 
the different types of manipulation on the 
last day of training, ie., Day 4. 

Figure 3 shows that, during skilled per- 
formance, very marked differences occur in a 


2 The measures on which the statistical analyses of 
this experiment are based, along with pertinent sum- 
mary tables not included in the presentation of the 
results, have been deposited with the American Docu- 
mentation Institute. Order Document No. 4810 from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting in advance $1.25 for microfilm, or 
$1.25 for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 





Dimensional Analysis of Motion: X. 


SECONDS 




















TOGGLE PULL PUSH 


6 eeeencrmtinel 


A 


SECONOS 


























PRESSING PRESSING TURNING 
DOWN 


DIAL 
TO-GETHER LEFT , SETTING 





B 


Fic. 3. Differences in the duration of a travel 
movement with different types of manipulation. 
Figure A gives the data for the first experiment and 
Fig. B that for the second. 


travel movement of fixed length when this 
movement is associated with different types 


of manipulation. In the first experiment sig- 
nificant differences are found in the travel 
movement for all types of manipulation ex- 
cept the pull movement and pressing the push 
button. The bracket in Fig. 3A indicates 
the fact that the differences between these 
two forms of manipulation, as evaluated in 
terms of the Duncan Range Test, is not sig- 
nificant. Significant differences also occur in 
the travel movement under the four condi- 
tions of manipulation in the second experi- 
ment. The dial-setting manipulation pro- 
duces a travel movement markedly different 


223 


from that observed in the other three condi- 
tions. In terms of the Duncan Range Test, 
this difference is significant at the .01 level. 

The percentage differences in a travel move- 
ment related to different conditions of ma- 
nipulation may be described as follows. {n 
the first study, if we use the knob-turning 
manipulation conditions as giving a travel- 
movement duration representing a 100 per 
cent base, then the other three conditions 
are, in order, 112, 122, and 128 per cent of 
that base time. In this first study, the travel 
movement related to the push-button ma- 
nipulation was of longest duration. In the 
second study, if we let the duration of the 
travel movement associated with the down- 
ward switching manipulation represent a 100 


29 
2. 
27 
2: PUSH 


2. eon — PULL 


es TOGGLE 


22 cembee = TURN 
2.1 


2.0) 


SECONDS 








3.9 
3.7 
3.5 
3.3 | 


S 


2.7 


2.5 
5 TURNING LEFT 
2.3 oar PRESSING TO-GETHER 


ee = 
° “@ PRESSING DOWN 
2.1 


DIAL SETTING 


SECONDS 








Fic. 4. Learning curves for a travel movement of 
fixed length under different conditions of manipula- 
tion. The top curves (A) are for the first experi- 
ment and the bottom curves (B) for the second. 
Note the difference in the scale for the two curves. 





224 


per cent base, the duration of the travel 
movement in the other three conditions are, 
in order, 104, 104, and 152 per cent of this 
base time. The manipulation conditions giv- 
ing the biggest difference in the travel move- 
ment in relation to all other conditions is the 
dial-setting task. 


Interaction of Travel and Manipulation in 
Learning 


Figure 4 presents learning curves for the 
travel movement observed in this study. 
These curves are based on four days of prac- 
tice. Separate curves are drawn for the eight 
different conditions of manipulation used in 
the study. Figure 4A gives the curves ob- 
tained in the first experiment and 4B those 
obtained in the second. The time scales of 
the two sets of curves are different because 
of the marked differences in the level of the 
functions in the second experiment. 

There are two major points to be noted 
about the two sets of curves of Fig. 4. The 
first is that marked learning effects occur in 
the travel motion for some conditions of ma- 
nipulation and not for others. The right- 
turn manipulation and the downward switch- 


Donald Hecker, Donovan Green, and Karl U. Smith 


ing motion give travel movements that change 
hardly at all throughout practice. The left- 
turn, squeeze, upward-switching (Toggle), 
and pulling manipulations are associated with 
travel movements that show slight changes 
due to learning. The push-button and dial- 
setting manipulations give travel motions that 
show marked learning effects. Thus, the de- 
gree to which practice will change the dura- 
tion of a travel motion of constant length will 
depend upon the type of manipulation in- 
volved. 

Separate analyses of variance were carried 
out for the measurements of travel move- 
ments in the two different experiments. These 
analyses are summarized in Table 1. In both 
experiments, the critical variable, conditions 
of manipulation, is significant at the .01 level. 
Results similar to these general findings have 
already been discussed for the differences on 
Day 4 of each experiment, as shown graphi- 
cally in Fig. 3. In the general analyses of 
Table 1, it is also observed that the variable 
“Days” turns out to be significant, as well 
as the Condition x Days interaction. These 
findings serve to give some additional mean- 
ing to the learning curves discussed in rela- 


Table 1 


Summary of the Analysis of Variance of the Travel-Time Data 








Experiment 2 





A 


Source df 





Days 3 


Subjects 23 
Conditions 3 
Days X Subjects 69 
Days X Conditions 9 
Conditions X Subjects 69 
Conditions X Subjects X Days 207 





Mean 
Square 
1.399 
.990 
36.080 
.122 
362 
178 


SS 


4.199 
22.776 
108.241 
8.415 
3.258 
12.259 
3.882 


11.476** 
52.732** 
203.073** 
6.494** 
19.276** 
9.361** 


Experiment 1 





Days 

Subjects 

Conditions 

Days X Subjects 

Days X Conditions 

Conditions X Subjects 
Conditions X Subjects X Days 





6.855** 
8.730** 
105.342** 
1.299 
.839— 
2.720** 





** Significant at 1% level. 





Dimensional Analysis of Motion: X. 


tion to Fig. 4. Significant learning effects 
occur in the travel movements under certain 
conditions of manipulation, but the specific 
condition defines the nature of these effects. 


Comparison of the Duration of Different Ma- 
nipulative Movements 


In order to clarify further the main points 
of this study, a brief examination will be 
made of the differences in duration of the 
actual manipulative movements which were 
carried out. Figure 5 summarizes learning 
curves for the eight different manipulative 
movements during the four days of practice. 
Figure 5A gives the data for the first experi- 
ment and 5B for the second study. Again 
the second set of curves has been drawn on 
a different time scale than that used in the 


ie) 
(=) 
2 
°o 
oO 
WwW 
” 








DIAL SETTING 


“8 PRESSING DOWN 
— ““® TURNING LEFT 
“S+ — ——ePRESSING TOGETHER 


SECONDS 








2 3 4 
B 
DAYS 
Fic. 5. Learning curves over the four days of 
practice for the different types of manipulation. The 
top curves are for the first experiment and the bot- 


tom curves for the second. Note the differences in 
the time scale for the two sets of curves. 


225 


first study because of the marked time dif- 
ferences in the data concerning dial setting. 

Examination of the curves of Fig. 5 shows 
that marked learning effects are found for all 
types of manipulation used. The types show- 
ing the least learning in the situation are the 
two turning movements and the pressing ma- 
nipulation. The sharpest learning effects are 
observed for dial setting, pressing up a toggle 
switch, and the pull manipulation. 

Some degree of interaction of the different 
learning functions is graphically portrayed in 
Fig. 5, but the instances of such effects are 
limited. In general, the original relative dif- 
ferences between the different forms of ma- 
nipulation are maintained throughout the 
learning period. It is to be noted that the 
perceptually loaded manipulation of dial set- 
ting gives the longest manipulation times, and 
the pull type of manipulations the shortest 
time. 

The differences portrayed in Fig. 5A have 
been examined in terms of analysis of vari- 
ance in order to give an illustrative appraisal 
of the data obtained in this phase of the ex- 
periment. Conditions of manipulation were 
found to be significant at the .01 level of 
confidence. The variable days, which is the 
basis of learning effects in this study, also is 
significant at the .01 level. Accordingly, both 
the types of manipulation and learning effects 
are significant in this particular experiment. 


Discussion 


High-precision electronic methods of mo- 
tion analysis have been developed especially 
for the experimental investigation of the sys- 
tematic problems of industrial time-and-mo- 
tion study. These methods utilize special 
vacuum-tube switching circuits and electronic 
interval timers that permit timing of the dura- 
tion of different component movements at or 
beyond a precision of 0.001 seconds. In this 
study these methods have been used to de- 
termine whether or not a travel movement of 
the arm of fixed length (24 inches) changes 
in duration when it is related to different con- 
ditions of manipulation. Eight different types 
of manipulation were used in the study. 

The main results of the experiments prove 
that there is marked interaction between the 





226 


travel and manipulative parts of a skilled 
task. In skilled performance a travel move- 
ment of fixed length can be changed in dura- 
tion as much as 52 per cent with different 
types of manipulation. 

The interaction between travel and ma- 
nipulation components of motion are found 
also during learning. The extent to which 
practice affects a travel movement of fixed 
length depends on the type of manipulative 
movement with which it is associated. This 
experiment proves that the presence or ab- 
sence of learning in a given travel movement 
may be determined entirely in terms of the 
interaction of this movement with manipuia- 
tive reactions. 

In general, types of manipulation that are 
perceptually loaded, such as dial setting, or 
require exact positioning of the hand, such as 
a push-button manipulation, are associated 
with travel movements of relatively long dura- 
tion. Travel movements in these same con- 
ditions of manipulation also show the most 
marked changes during learning. 

Measurements of the duration of eight dif- 
ferent types of manipulation used in the 
study indicate that significant differences oc- 
cur in the duration of these movements. Of 
the types of manipulation studied, dial-set- 
ting manipulation gives the longest time, and 
pulling a small latch gives the shortest time. 

Present methods of time study in industry, 
including predetermined time-standard sys- 
tems, assume an independence of the com- 
ponent movements in motion. The presence 
of interaction between the component move- 
ments making up an industrial task acts as 
an error-producing factor in both direct time 
study and in the application of a predeter- 
mined time standard. The error in applica- 
tion of one widely used predetermined stand- 
ard system is said to be of the order of 15 
per cent. In this study we find that a given 
component movement in a task, a travel 
movement, may change in duration as much 
as 52 per cent due to its interaction with dif- 
ferent types of manipulation. 

At this point, it may be worth while to 
mention that the results just described are 
based on data secured from subject operators 
after they had reached a skilled level of per- 


Donald Hecker, Donovan Green, and Karl U. Smith 


formance in the task situation that is not ma- 
terially improved by further practice. Ac- 
cordingly, this research is based on the mo- 
tion analysis of a level of skill that is very 
comparable to sustained work in the indus- 
trial situation. Furthermore, it is our gen- 
eral notion that Ss studied here were well 
motivated and highly cooperative, and that 
their work is equivalent in every major way 
to the industrial worker. Rather than re- 
ceiving money for their work, Ss in this study 
received point credits toward their final grades 
in courses in psychology. 

It is also our purpose to point out the com- 
pleteness of the motion analysis carried out 
here. If the motions investigated in this re- 
search had been measured by methods of 
micromotion analysis, using film speeds at 
100 frames per second to time the component 
movements, approximately 1,200,000 feet of 
film would have been necessary to conduct 
the work. The electronic methods of motion 
analysis permit relatively comprehensive ex- 
amination of the problems of time study in 
industry. 

In prior studies (2, 6, 8) it has been shown 
that the learning affects differently the travel 
and manipulation movements in motion. Ob- 
servations made in this study prove that the 
degree of learning which will occur in the 
travel and manipulative parts of a task de- 
pend not only upon the type of movement 
itself, but also upon the interaction of a 
movement with other component parts of the 
task. The rate of learning a travel move- 
ment of fixed length changes in relation to 
the types of manipulation with which this 
movement is associated. 

The results just noted are not entirely nega- 
tive for the industrial applications of motion 
analysis. The basic problem of motion study 
in industry is a scientific one involving de- 
tailed understanding of the properties and 
causation of movements used in work. This 
research provides accurate measures of the 
interaction of manipulative and travel com- 
ponents of movement in common tasks which 
may be used in handling both practical and 
theoretical problems of motion study. Sim- 
plified concepts of elemental movements in 
motion are not an adequate theoretical foun- 
dation for industrial time-and-motion study. 





Dimensional Analysis of Motion: X. 


The interaction of the separate component 
movements in motion is a fundamental prob- 
lem not only in industry but also in general 
experimental psychology. Inadequate meth- 
ods of motion analysis have limited the study 
of this problem heretofore. 

It is a common assumption that the process 
of learning is the decisive factor in the de- 
termination of the integration of movements. 
This experiment proves otherwise. The find- 
ings of this study are that the perceptual and 
reactive make-up of one component move- 
ment in a task defines the role of learning 
itself on all parts of the task. The nature of 
one part of a motion, e.g., the manipulation 
component, will not only determine the ex- 
tent to which learning affects this movement, 
but also defines the role of learning in chang- 
ing other component movements in the task. 

The advances in methods of this experi- 
ment are perhaps more important than the 
specific results reported on movement inter- 
action. The electronic methods of motion 
analysis developed here make possible the 
broad experimental study of the integration 
and organization of movements in psycho- 
motor skill in relation to learning, motiva- 
tion, emotion, growth, and other general as- 
pects of behavior. 


Summary 


High-precision electronic methods of mo- 
tion analysis have been developed and ap- 
plied to a problem of the interaction of the 
component movements in patterned motions. 
The experiment consisted in measuring the 
variation in a travel movement of constant 
length when this movement was performed in 
relation to eight different types of manipula- 
tion. 


227 


The results show that the duration of a 
travel movement of fixed length may change 
as much as 50 per cent when it is associated 
with different forms of manipulation. Fur- 
thermore, the degree to which this travel 
movement changes during learning depends 
on the type of manipulative movement with 
which it is related. 

The results are discussed in relation to in- 
dustrial time-and-motion study and in terms 
of their bearing on the general problem of 
integration of the component movements in 
motion. 


Received August 18, 1955. 


References 


1. Cohen, L., & Strauss, L. Time study and the 
fundamental nature of manual skill. J. con- 
sult. Psychol., 1946, 10, 146-153. 

. Harris, S. J., & Smith, K. U. Dimensional analy- 
sis of motion: VII. Extent and direction of 
manipulation movements as factors in defin- 
ing motions. J. appl. Psychol., 1954, 38, 126- 
130. 

. Lifson, K. A. Errors in time-study judgments of 
industrial work pace. Psychol. Monogr., 1953, 
67, No. 5 (Whole No. 355). 

. Rubin, G., & Smith, K. U. Learning and integra- 
tion of movements in a pattern of motion. 
J. exp. Psychol., 1952, 44, 301-305. 

. Ryan, T. A., & Smith, Patricia C. Principles of 
industrial psychology. New York: Ronald, 
1954. Pp. xiv and 534. 

. Simon, J. R., & Smader, R. C. Dimensional 
analyses of motion: VIII. The role of visual 
discrimination in motion cycles. J. appl. Psy- 
chol., 1955, 39, 5-10. 

. Smader, R. C., & Smith, K. U. Dimensional 
analyses of motion: VI. The component move- 
ments of assembly motions. J. appl. Psychol., 
1953, 37, 308-314. 

. Wehrkamp, R., & Smith, K. U. Dimensional 
analyses of motion: II. Travel distance ef- 
fects. J. appl. Psychol., 1952, 36, 201-206. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


The Speed and Accuracy of Reading Horizontal, Vertical, 
and Circular Scales * 


Norah E. Graham 


The Nuffield Department of Industrial Health, University of Durham 
(King’s College, Newcastle upon Tyne) 


A series of experiments has been designed 
to compare the human response to numerical 
information displayed on horizontal, vertical, 
and circular scales. It has already been 
shown (2, 3) that if an operator has to con- 
trol a moving pointer on a scale by turning a 
knob, then speed and accuracy are greatest 
when the horizontal scale is used. This sug- 
gests that clockwise rotation of the control 
knob is naturally associated with pointer 
movement from left to right when the con- 
trol is vertically below the display. The 
principal value of this work lies, therefore, in 
the information which it gives about display- 
control relations: the subjects could have ig- 
nored all the scale markings except the one 
on which the pointer was to be kept. The 
comparison of the three types of display is 
not complete without some measure of the 
speed and accuracy of making scale readings 


and this is the purpose of the experiments 
described here. 


Method 


The subjects (Ss) read the scales from a projected 
cinefilm. Horizontal, vertical, and circular scales, 
identical to those used in the previous experiments 
(Fig. 1) were drawn in white ink on black paper. 
The pointer, which was cut out of aluminum foil 
and painted white, was placed opposite the appro- 
priate number as each scale was photographed. A 
16-mm. cinecamera was used and the timing regu- 
lated by the successive-frame exposure technique. 
The camera was fitted with an accurate frame coun- 
ter worked from the shutter shaft so that each frame 
was counted as it was exposed. The speed of pro- 
jection was 24 frames per sec., so that, for example, 
a setting which the Ss were to see for 4 sec. was ex- 
posed for 12 frames. Each exposure was followed 
by 8 sec. of black spacing which allowed Ss to write 
down the scale reading. The word “READY” then 


1 Acknowledgments are due to Professor R. C. 
Browne for his advice in this work, and to Mr. H. 
Campbell, B.A., F.S.S., for statistical help; also to 
the Department of Photography, Medical School, 
King’s College, for their cooperation in making the 
films. 


appeared on the screen, for 2 sec., to prepare Ss for 
the next scale. 

The projected circular scale was 5.1 in. in diameter 
and the horizontal and vertical scales were 16 in. in 
length. The intervals between scale markings were 
therefore the same on all three displays. The scales 
were viewed from a distance of 40 in..and appeared 
approximately at eye level. The angle subtended at 
the eye by the image of a scale on the screen was 
comparable to that subtended by the displays in 
the tracking experiments. 

The film started with one example of each scale, 
which remained on the screen for 10 sec.; the cor- 
rect reading appearing alongside the scale after the 
first 5 sec. This was followed by 9 practice read- 
ings, three on each scale, and then by the test itself. 
In both practice and test, the exposure time was 
5 sec., this value having been chosen as the result 
of a pilot experiment. 

When choosing the test numbers, the scales were 
considered as being made up of five major segments 
—O-2, 2-4, 4-6, 6-8, and 8-10—and on each scale 
two ,readings were chosen in each segment. The 
subdivisions within the major segments were divided 
into two groups: 

1. .1, .4, .6, and .9, all of which are next to an 
extra long graduation mark, and, 

2. .2, 3, .7, and .8, all of which are two subdivi- 
sions away from such a well-defined scale marking. 

On each scale five readings were chosen in the first 
of these two groups and five in the second. Thus, 
with only three scales, five major segments and two 
types of subdivisions to be considered, it was only 
necessary for each subject to make 3 X 5 X 2=30 
readings in order for a complete analysis of the re- 
sults to be possible. 

Sixty male university students, all studying some 
branch of engineering, acted as Ss. 


Results 


The Ss’ responses were scored as follows: 

Correct readings scored 0. 

Readings in error by + 0.1 scale units 
scored 1. 

All other errors and omissions scored 2. 

The resulting distribution of scores was ap- 
proximately normal. Marked improvement in 
performance occurred during the practice ex- 
posures, but the scores obtained during the 


228 





Speed and Accuracy in Scale Reading 


Fic. 1. 


experiment proper show no systematic im- 
provement. 

The error score for each segment of the 
three scales is shown in Table 1. The high 
incidence of mistakes at the ends of the 
scales is very noticeable. This is to be ex- 
pected on the linear scales as it may take 
longer to find the pointer in these positions, 
but it is surprising to find a similar trend on 
the circular scale. 

In an analysis of variance (Table 2) the 


Table 1 
The Total Error Score for Each Segment 
of the Three Displays 


Scale 


Ver- Cir- 
tical cular 


Hori 
zontal 


Major 
Segment 


Total 
Error 
0-2 80 72 78 230 
2-4 33 58 38 129 
46 44 48 132 
6-8 18 64 123 
8-10 53 240 


Total error 228 854 


Horizontal, vertical, and circular scales. 


three variables, subjects, scales, and units, 
and their first- and second-order interactions 
were considered. 

The first order interaction between scales 
and units is significant (P< .001). This 
means that the position on the scale in which 
the pointer lies has more effect on the ac- 
curacy of reading on one type of scale than 
on another, and has been shown to be due to 


Table 2 
An Analysis of Variance of the Errors in 
Scale Reading 
Source Variance 
1.185 
8.521 
9.605 


Between Ss 
Between scales 
Between units 


Interactions 


0.4058 
0.3482 
2.358 

0.3973 
0.3877 


Ss and scales 

Ss and units 
Scales and units 
Ss, scales, and units 
Residue 


Total 





230 


the very high error at the top of the vertical 
scale. Many more mistakes were made be- 
tween 8 and 10 on this scale than in any 
other region of the three displays. 

When compared with this significant inter- 
action, the variance due to the shape of the 
scale is found to be significant at the .05 
level of confidence. It was shown by means 
of the ¢ test that the errors are significantly 
greater on the vertical scale than on the hori- 
zontal or the circular scale, but the difference 
between the latter two may be attributed to 
chance. 

Another significant variable is the unit or 
section of the scale in which the pointer lies. 
In this case the ¢ test shows that the liability 
to make mistakes is significantly greater at 
the ends of the scales in sections 0-2 and 8— 
10 than in the three middle sections, 2-4, 4— 
6, and 6-8. 

A more detailed analysis of the results 
showed that the position of the subdivision 
within the major segment (i.e., the tenths) 
had no significant effect on the accuracy of 
reading. The total error score for the group 
of readings ending in .1, .4, .6, or .9 was 457, 
while the total score for those ending in .2, 
3, .7, or .8 was 397. 

When compared with the residual variance 
the differences between Ss are highly signifi- 
cant. The best S read 29 out of the 30 


ie ood Bean ibe 


Norah E. Graham 


scales correctly, while the poorest made 21 
mistakes. 

Table 3 shows the frequency with which 
different types of error occurred on the three 
scales. The number of correct readings was 
greatest on the horizontal scale, and even if 
the readers had been allowed a margin of 
error of + 0.1 scale units, this display would 
still have ranked first in order of accuracy. 
Readings on the circular scale, on the other 
hand, were nearly always correct to within 
0.2 scale units and only one reading on this 
display was missed altogether. 

When the direction of the errors is taken 
into account it is seen that there is a tend- 
ency to overestimate a reading by 0.1 or 0.2 
scale units on the circular scale. This was 
particularly true of the four readings 0.2, 
8.6, 1.4, and 4.6. For example, 11 Ss read 
1.4 as 1.6 and 13 read it as 1.5. Only four 
Ss underestimated and called it 1.3. Or 
again, 8.6 was read as 8.8 by nine Ss, and as 
8.7 by 17 Ss, whereas only two mistook it for 
8.5. This accounts for the high error score 
at the extremities of the circular scale, par- 
ticularly between O and 2, though it does 
not explain it. Such a tendency to overesti- 
mate is not peculiar to the circular scale, how- 
ever. On the vertical scale errors of + 0.1 
occur much more frequently than those of 
— 0.1. 


Table 3 





Horizontal 





Number 





Correct readings 
Errors 

+1.0 

—1.0 


+0.2 
—0.2 


+0.1 
—0.1 


Other errors 
Missed readings 


Total 


The Frequency with Which Errors of Different Magnitude Were Made on Each Scale 


Vertical Circular 


Number q 


Number of 


65.0 


324 . 390 


2 


0.3 
0 


2.0 


100.0 








Speed and Accuracy in Scale Reading 


Discussion 


The gross errors of + 1.0 scale unit which 
occurred in the present experiment were all 
associated with readings in the second half of 
a numbered division. Kappauf (4) remarks 
that under these conditions the scale number 
read is apt to be that nearest to the pointer. 
The tendency noted by the same author to 
“round out” readings, particularly in the first 
numbered interval of scales which start at 
zero, is not apparent in the present experi- 
ment, presumably because of instructions to 
record the zero in such cases; it may, how- 
ever, occur in practical situations. Vernon 
(6) considers that gross mistakes are also 
liable to occur near the zero on circular scales, 
but the present results confirm the finding of 
Sleight (5) that gross errors at the ends of 
a scale are less frequent on scales without a 
clearly defined break. 

The mistakes which do happen at the ends 
of the circular scale are principally local, that 
is to say, of less than one numbered scale 
division. Local errors in any part of the 
scale display a tendency to overestimation. 
This was also noted by Sleight and seems to 
have no obvious explanation. 

Sleight attributes the differences between 
the scales used in his experiment to the varia- 
tion in their “effective” area; the larger the 
area to be scanned the less accurate the read- 
ing. Such an explanation does not account, 
however, for the difference between the hori- 
zontal and vertical scales which he also found 
to be significant and which the present work 
suggests is the more important difference. 
From a physiological point of view, an ex- 
planation can be based on the shape of the 
visual field and the mechanics of eye move- 
ments. Objects that subtend an angle of 
more than 4° at the eye can be detected if 
they lie within a field whose boundaries are 
approximately 100° to the right or left of the 
point of fixation, 70° above it and 80° be- 
low it. The width of the visual field is thus 
considerably greater than its height, which 
is one factor that might favor the reading of 
horizontal scales. This is simply another 
way of saying that the eyes are set in the 
head in a horizontal line. The linear displays 
as they appeared in this experiment sub- 


231 


tended an angle of approximately 10° at the 
eye. No difficulty should have been experi- 
enced, therefore, in finding the pointer even 
at the top of the vertical scale. The region 
of foveal vision, however, only subtends an 
angle of about 3° at the eye and, in order to 
read the scale, it is necessary to focus on the 
pointer itself. During very short exposures 
the accuracy of reading therefore depends 
upon the speed with which eye movements 
can be made. Scanning along a horizontal 
line is a relatively simple action involving 
the use of the lateral and medial recti muscles 
only. Raising or lowering the eyes, on the 
other hand, involves the joint action of the 
superior and inferior recti and the inferior 
and superior obliques. According to Duke- 
Elder (1) it has been shown by photographic 
studies that the eyes can follow lines in the 
horizontal plane more easily than in any 
other. It has been found, moreover, that 
horizontal eye movements are the most rapid 
and vertical ones the slowest. When the fact 
that people are accustomed, when reading, to 
scanning along a horizontal line is added to 
this evidence, it is not difficult to explain the 
superiority of the horizontal scale. 


Summary 


1. The speed and accuracy of reading com- 
parable horizontal, vertical, and circular scales 
has been studied by means of a film. Pic- 
tures of the scales were flashed on a screen 
at 10-sec. intervals, the exposure time being 
4 sec. 

2. The vertical scale is clearly less easy to 
read than either of the other two displays, 
particular difficulty being experienced near 
its ends. 

3. The success of the circular scale may be 
attributed to the fact that it presents a 
smaller area to be scanned. The shape of 
the visual field and the relative ease of mov- 
ing the eyes from side to side, rather than 
up and down, are thought to account for the 
greater accuracy on the horizontal scale. 


Received July 5, 1955. 


References 


1. Duke-Elder, W. S. Textbook of ophthalmology. 
London: Kimpton, 1932. 





232 


2. Graham, N. E., Baxter, I. G., & Browne, R. C. 
Manual tracking in response to the display 
of horizontal, vertical and circular scales. 
Brit. J. Psychol. (Gen. Sec.), 1951, 42, 155- 
163. 

3. Graham, N. E. Manual tracking on a horizontal 
scale and in the four quadrants of a circular 
scale. Brit. J. Psychol. (Gen. Sec.), 1952, 
43, 70-77. 


Norah E. Graham 


4. Kappauf, W. E. 
habits. 
6569. 

5. Sleight, R. B. The effect of instrument dial shape 
on legibility. J. appl. Psychol., 1948, 32, 170- 
188. 

6. Vernon, M. D. Scale and dial reading. 
Personnel Res. Committee Rep., 
668. 


A discussion of scale-reading 
USAF, WADC Tech. Rep., 1951, No. 


Flying 
1946, No. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Evaluation of a Display Incorporating Quantitative and 
Check-Reading Characteristics 


Martin I. Kurke 


U. S. Army Ordnance Human Engineering Laboratory, Aberdeen Proving Ground 


The present paper describes an evaluation 
of the validity of a new principle underlying 
dial design. It is demonstrated that a hu- 
man operator can check-read the proposed 
quantitative dial with significantly fewer 
errors and with greater speed than the con- 
ventional quantitative dial faces now in use. 
The dial design, described elsewhere in detail 
(4), is based upon the principle that a moni- 
tor can visually perceive and interpret sim- 
ple high-contrast figures of known symbolism 





Fic. 1. Descriptive drawing of experimental 


dial design. 


Disc A less angle a and with pointer p is mounted 
on disc B. Upon the latter is painted circle C less 
angle c. Under “Safe and Normal” conditions (II) 
angles a and c are aligned, but under “Red-line” 
conditions (III) the painted area C is exposed pre- 
senting a contrasting colored wedge (see 4). 


more readily than he can perceive and re- 
act to such a complex process as reading a 
scale (2, 5). In essence, the dial face is so 
designed that when the indicator is pointing 
to a portion of the scale indicating caution or 
danger functions of the machine system, there 
appears on the face of the dial a high-contrast 
wedge which is not present when the ma- 
chine is operating within “safe and normal” 
limits (Fig. 1). This wedge insures that 


changes in shape, area, hue, and brillance at- 
tract the eye of the monitor. 

The engineering validity of the dial prin- 
ciple was previously investigated and post- 
flight comments were obtained from test 
pilots who flew a Bell Model 47-G helicopter 
in which a prototype dial face was installed 
within the airspeed indicator for a period of 
one month. During this time the aircraft 
was used for other flight demonstrations. In- 
formal interviews and flight test reports (1) 
elicited generally favorable comments from 
the test pilots, who indicated their belief that 
the dial design would be more efficacious in 
a panel consisting of a large number of cru- 
cial dials requiring only an occasional glance 
for monitoring than for a helicopter airspeed 
indicator. 


Method 


Four decks of 50 index cards were prepared, three 
of which had dial faces drawn on them (Fig. 2). 
All dials were two inches in diameter and were num- 
bered clockwise from zero to ten along the pe- 
riphery at 224-degree intervals with zero at the top. 
The indicator was drawn to point to the integers 
from 0 through 10, and to 4, 14, 84, and 94. On 
thirty of the cards in decks A, B, and C, the indi- 
cator pointed to a number between 14 to 84 inclu- 
sive. These represented “safe and normal” opera- 
tion. The indicator on the remaining twenty cards 
pointed to numbers in the 0-1 and 9-10 intervals. 
These represented “red-line” operation. In control 
deck A no further indications were made on the 
dials. In control deck B, the red-lined areas were 
marked off with a red edging along the dial periphery 
between 0-1 and between 9-10. Otherwise this deck 
was identical with A. Experimental deck C was 
drawn so that when the pointer indicated “safe and 
normal” only the pointer and number showed as in 
deck A. When the indicator pointed to “red-line” 
operation, however, a red wedge appeared on the 
dial face. The size of this wedge increased in size 
as a function of deviation from “safe and normal” 
conditions. Deck D consisted of 50 consecutively 
numbered cards. Twenty of these numbers were 
randomly chosen and drawn in red at the top of 
the card; the balance were in black. In the center 
of each card was a two-inch circle, 20 of which 


233 





234 


"Red-Line” “Safe & Normal" 


Fic. 2. The appearance of decks A, B, and C show- 
ing “Safe and Normal” and “Red-line” displays. 


were chosen at random and filled in so that they 
appeared as black spots. The remaining 30 were 
left as circles. 

Of the 33 male Ss used, 9 were Scientific and Pro- 
fessional rated enlisted men (engineers, physicists, 
and mathematicians) assigned to the Army’s Bal- 
listic Research Laboratories; 10 were civilian and 
military engineers and psychologists (5 each) em- 
ployed by the Human Engineering Laboratory; and 
14 were engineering employees of Bell Aircraft Cor- 
poration. The raw time data of the latter group 
were lost, necessitating calculation of raw time-score 
data with only 19 Ss. All other data were based 
on an N of 33. 

The Ss were tested individually. Each S was in- 
formed that he would receive a shuffled deck of 
cards face down, which he was to hold in one hand. 
He was then required-to turn the cards over one at 
a time and place them in two piles according to a 
separate criterion to be given for each of six card 
sorts. He was informed that accuracy was of prime 
importance and that he could correct any mistake 
provided he had not started to turn over the next 
card in the deck. He was also told that speed was 
almost as important as accuracy and to sort the 
cards as rapidly as he could consistent with accuracy. 
Each S§ first sorted deck D on the basis of number 
color. The sole purpose of this sorting was to en- 
able Ss to get the “feel” of the cards and practice in 
manipulating them. As on all sortings, time and 


Martin I. Kurke 


error scores were recorded. However, for the first 
sorting these data were not used. On trial 2, deck 
D was again sorted, this time on the basis of 
whether or not the circles were filled in. Then 
every third S (Group I) was shown deck C, and 
the mechanisms of the dial it represents was ex- 
plained. He then sorted “safe and normal” from 
“red-line” displays. The S then received instruc- 
tions on the dial system for deck A, which he sorted 
on the same basis, followed by a similar procedure 
for deck B for his fifth sorting. Groups II and III 
performed similarly, except that their sequences for 
decks A, B, and C were counterbalanced. For the 
final card-sort, deck D was again sorted on the basis 
of difference in circles. 

In addition to practice in manipulation of the 
cards, sorts 2 and 6 had another purpose. It may 
readily be seen that the card-sorting technique meas- 
ures two things: the speed of discrimination and the 
motor response time. Some method was needed to 
eliminate the effects of motor activity from the time 
scores. Sorting deck D in trials 2 and 6 enabled Ss 
to make a discrimination taking a negligible time to 
perform. We might make the assumption, there- 
fore, that the time to sort on the basis of bright- 
ness discrimination is almost completely the motor 
response time. However, motor time changes with 
practice. Therefore, the mean time of trials 2 and 
6 would be the most reliable estimate of motor time 
on trial 4. Since decks A, B, and C were sorted on 
trials 3, 4, and 5 in equal numbers, by subtracting 
each S’s mean time of 2 and 6 from his scores of 3, 
4, and 5 the most reliable estimate of the time taken 
to make the discriminations in decks A, B, and C 
was yielded. The latter scores will be referred to 
as “adjusted time scores.” 


Results 


Errors. It was felt that the differences in 
errors were so great that a statistical analysis 
would be superfluous. Thirty-three Ss, each 
making 50 discriminations on deck C, made 
a total of only one error out of 1,650 trials. 
Using the conventional ‘“red-line” dial (deck 
B) the same Ss made 18 errors, while dials 
without any warning system yielded 39 errors 
(Fig. 3). 

Speed. 


Mean raw score data for sorting 
50 cards with no warning indicator (deck A) 


was 73.1 sec. The conventional “red-line” 
dial yielded a mean score of 69.6 sec. and the 
experimental display yielded a mean score of 
52.9 sec. The three measures had standard 
deviations of 13.0, 14.9, and 10.3 seconds, 
respectively. 

Mean adjusted time scores for the displays 





Evaluation of Dial Design 


8 A. 


ro) 
i 


———- Error frequency -(E/1650 Decisions) 
-_——— Tune -(Sec/50 Decisions) 





° 





Dial A Dai 8 Oia © 


Fic. 3. Time and error as a function of dial. 
were 27.8 (0 = 10.0), 20.5 (0 = 9.8), and 
4.3 (¢ = 4.9) seconds, respectively, for dials 
A, B, and C. 

The loss of a portion of the raw data pre- 
cluded the reporting of anything but Stu- 
dent’s ¢ test in determining the differences 
between means of raw time scores: t4, = 0.76 
(chance): tye = 3.91 (6 < 0.01); and tae 
=5.17 (p< 0.001). The differences be- 
tween untreated and conventional red-lined 
dials, the latter and the experimental wedge, 
and the untreated and the experimental dials 
in terms of adjusted speed scores yielded: tap 
= 2.92 (p< 0.01); tae = 8.35 (p < 0.001); 
and tac = 10.61 (p <0.001), respectively. 
Owing to the above-mentioned loss of data, 
although adjusted scores have been calcu- 
lated on the basis of 32 df, the raw scores 
were figured on the basis of 18 df. 


Discussion 


An unexpected result of this study is that 
the two control groups differed so little on the 


basis of raw time scores. Although no objec- 
tive evidence to support it seems to be avail- 
able, the widespread practice of red-lining 
dials as in deck B to indicate an abnormal 
state in a machine system attests to an al- 
most universal acceptance of the red-line as 
an aid to check-reading of dials. It was sur- 


235 


prising, therefore, to learn from the data that 
although red lining halves the error score, in 
terms of raw time scores any advantage in 
speed of reading obtained by use of the con- 
ventionally red-lined dial over the untreated 
display is due to chance alone. These re- 
sults, of course, apply specifically to dials 
read for the purpose of card-sorting. If, how- 
ever, the motor aspects of the perceptual- 
motor task are removed, the adjusted scores 
indicate that red lining does provide a sig- 
nificant reduction in reading speed at the .01 
level of confidence. 

Experimental dial C proved to be superior 
to the untreated control at the .001 level of 
confidence, and to the red-lined dial at the 
same level when compared on the basis of 
adjusted scores, but only at the .01 level if 
raw scores are considered. A possible ex- 
planation for the apparent superiority of the 
experimental dial lies in the fact that no 
pointer reading is necessary in order to check- 
read the display. Only the simplest of dis- 
criminations is required. This is in accord- 
ance with the fact that a good display is 
easily read and reduces complexity; critical 
displays are very visible, and changed indi- 
cations are easily detectable (6). The inher- 
ent features of the display also agree with 
the principle that “the instrument shall be 
designed in such a way that the reader will 
not have to remember special rules about 
them in order to read without error” (3). 
Presumably remembering numerical limits 
falls within the category of “special rules.” 


Summary and Conclusions 


By use of a card-sorting experiment, a com- 
parison of three dial designs was made from 
the standpoint of accuracy and the speed of 
check-reading. Within the limits of this ex- 
periment, it was demonstrated that the con- 
ventional method of red lining a dial to in- 
dicate a deviation from “safe and normal” 
operation is significantly better than no “red- 
line” indication at all provided the criteria are 
errors, or reading time isolated from associ- 
ated motor activity. It was also demon- 
strated that the experimental dial design prin- 
ciple is significantly more efficient than the 





236 


other two, regardless of the three measures 
used in comparison. It is suggested that the 


experimental dial design is more easily read 
due to the fact that a simpler form of visual 
discrimination is required than for the task 
of reading the other dials. 


Received September 23, 1955. 


References 


1. Cannon, J. A. In-flight evaluation of “A quali- 
tative instrument face: CDS-16-2-54.” Buf- 
falo, N. Y.: Bell Aircraft Corp. Memo ENG: 
12:4:0914-1:JAC, 1954. 


Martin I. Kurke 


2. Chapanis, A., Garner, W. R., & Morgan, C. T. 
Applied experimental psychology. New York: 
Wiley, 1949. 

3. Kappauf, W. E. A discussion of scale reading 
habits. USAF, WADC, Tech. Rep., 1951, No. 
6569. 

4. Kurke, M. I. A _ qualitative 
Aero Digest, 1955, 70, 24. 

5. Reed, J. B. The speed and accuracy of discrimi- 
nating differences in hue, brillance, area, and 
shape. Port Washington, L. IL, N. Y., U. S. 
Navy, Special Devices Center, 1951. (Tech. 
Rep. 131-1-2.) 

6. Senders, V. L., & Cohen, J. The display charac- 
teristics of a good instrument. Abstr. Air- 
borne Electronics Conf. (Inst. Radio Engnrs.), 
1953, 27-29. 


instrument face. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Comprehension by Reading versus Hearing 


Wilse B. Webb and Edward J. Wallon’ 
U. S. Naval School of Aviation Medicine 


Can a person best comprehend a body of 
information by reading or by hearing? This 
turns out to be a difficult question to answer. 
As pointed out in a recent review, the answer 
is influenced by at least five factors: the type 
of material, the method of presentation, the 
comprehension measures employed, the per- 
ceivers used, and the surrounding conditions 
(2). In a given situation one is most fre- 
quently forced to make an “educated”’ guess. 

We have recently collected a considerable 
amount of comparative data on this problem. 
Our findings are presented to improve our 
“guesses” in situations similar to those ap- 
proximated by our experimental design. Our 
particular conditions were as follows: (a) 
The material was unfamiliar story-form ma- 
terial containing considerable detail; (4) the 
methods of presentation used were by tape 
recording; self-paced, single “read through” 
of printed material; study of printed mate- 
rial; and simultaneous reading-listening; (c) 
comprehension was measured by a true-false 
examination both immediately and for a 24- 
to 48-hour recall; (d) the subjects were 
highly selected college-level males and the 
material was given under standard testing 
conditions. 


Procedure 
Subjects 


The subjects were young, healthy males between 
the ages of 18 and 25. All had a minimum of two 
years’ college training or the equivalent. All had 
been screened as acceptable for naval flight training 
by a general intelligence type test, a biographical in- 
ventory, a mechanical comprehension test, and a 
spatial apperception test. The subjects were tested 
in groups of approximately 30 subjects each. 


Test Material 


It was desirable to use material which had low fa- 
miliarity for the subjects. To this end stories of 
Greek mythology taken from Bulfinch’s Mythology 


1Qpinions and conclusions contained in this re- 
port are those of the authors. They are not to be 
construed as necessarily reflecting the view or the 
endorsement of the Navy Department. 


were used (1). Six stories were chosen: Pyramus 
and Thisbe (947 words), Juno and Io (884 words), 
Diana and Acteon (770 words), Nisus and Scylla 
(793 words), Callisto (514 words), and Cephalus 
and Procris (741 words). In most cases the stories 
were taken intact; in several a number of passages 
were deleted without affecting the story continuity. 

Preliminary analyses indicated that a minimum of 
three stories per test was required for adequate reli- 
ability, enough material on which to construct ques- 
tions, and of sufficient length to achieve adequate 
difficulty of items. 

To meet these needs, two forms of the tests were 
developed. Form I consisted of three stories: Py- 
ramus and Thisbe, Juno and Io, Diana and Acteon; 
Form II was made up of three stories: Nisus and 
Scylla, Callisto, and Cephalus and Procris. The writ- 
ten form of these tests is available from the files of 
the Psychology Laboratory of the Naval School of 
Aviation Medicine. 

The two sets of stories were used so that com- 
parisons of differences between methods of adminis- 


Table 1 


Familiarity with Story Material Prior to Testing © 


No. of Stories 
Familiar to 
Subject 


Form IT 
(N =37) 


Form I 
(N=73) 


0 59 34 
10 3 
0 
0 


1 
2 
3 


tration within subjects could be analyzed. These re- 
sults are to be reported elsewhere. 

That the material was essentially unfamiliar is at- 
tested by the figures given in Table 1 which presents 
the number of stories in each form with which two 
representative samples of the population were fa- 
miliar. 


Measures of Comprehension 


For each story 16 true-false questions were con- 
structed. On the basis of an item analysis of the 
first several presentations of each form several items 
were eliminated from each form as either not dis- 
criminating between the upper and lower fourths of 
the distributions or as having unfortunate pass-fail 
proportions. The resultant number of items for each 
story for each form were: 


237 





238 


Form I 
Pyramusand Thisbe 15 
Juno and Io 15 
Diana and Acteon 15 


Form II 


Nisus and Scylla 14 
Callisto 16 
Cephalus and Procris 13 


45 43 


Method of Presentation 


Auditory. The two forms of the tests were tape 
recorded. The following set of instructions was first 
recorded: 

“In this recording you will be presented with a 
group of three stories. Listen to each carefully. 
After the stories have been concluded, they will be 
followed by a series of statements. You are to in- 
dicate whether each statement is true or false by 
filling in between the dotted lines of your IBM an- 
swer sheet under the small number “1” if the state- 
ment is true, or by filling in between the dotted lines 
under the small number “2” if the statement is 
false.” 

The test (Form I or Form II) was then presented. 
Following the reading of the three stories constitut- 
ing a given form, the questions were then given on 
the tape. Answers were marked on IBM answer 
sheets. 

Read through. The two forms of the tests were 
mimeographed. The following set of instructions 
was given verbally: 

“You have each been given three sets of reading 
material. You will be asked to read these materials 
following which you will be given a brief test con- 
taining true and false statements which you are to 
answer. 

“When the signal is given, you are to turn over 
your papers and begin reading. It is not necessary 
to read fast or hurry. This is not a test of reading 
speed. You are to read at your usual speed. Read 
through each of the stories once, and only once. 
Do not reread or review any sentences. Read each 
statement just once and go on to the next sentence 
without pausing between sentences to memorize or 
study the material you have gone over. Do not re- 
turn to any parts of the material you have already 
read.” 

The subjects were then given a set of mimeo- 
graphed questions with IBM answer sheets. Again, 
they were told to read the questions only once and 
not to delay on any question. They were to answer 
on the IBM answer sheets. 

Read-study. The subjects were given the mimeo- 
graphed material and told to study the material. 
They were permitted to study the material for a pe- 
riod equivalent to the time required to present the 
material auditorily; this was seventeen minutes. 
‘The stories were then taken up and the subjects 
were given the mimeographed questions and IBM 
answer sheets. They were then instructed to answer 
the questions. Again they were given the time re- 
quired to present the material verbally. 


Wilse B. Webb and Edward J. Wallon 


Auditory-read. The subjects were given the mimeo- 
graphed stories, told that these stories had been re- 
corded, and instructed to listen to the stories and 
use the written material as they so desired. They 
were informed that an objective exam would be 
given on the material of the stories and that the 
written material would be taken up at the end of 
the recording. At the end of the recording the 
stories were taken up and written questions and 
IBM answer sheets were passed out. The recorded 
questions were played and the subjects answered on 
IBM answer sheets. The written questions were 
taken up along with the answer sheets at the end 
of the recording. 

The procedure for presenting the stories varied 
among the different groups. Some groups received 
only one type of administration. Several groups re- 
ceived the two forms of the test under the same 
type of administration. Still other groups received 
the tests under two different kinds of administra- 
tion. These tests were always given at least 24 
hours apart. No practice effect was found under 
these conditions so each test has been considered in- 
dependently. 

In all of these presentations the subjects were 
seated in a group testing room and the tests were 
administered along with other tests as a part of the 
routine procedure of entering the Training Com- 
mand 


Results 
Reliability of Measures 


The forms and methods of administration 
yielded the split-half reliabilities estimates 
given in Table 2. These are odd-even cor- 
relations corrected by the Spearman-Brown 
formula. 


Comparison of Methods of Presentation 


The means, sigmas, and number of sub- 
jects for the methods of presentation by 
forms are given in Table 3. 

Separate analyses of variance by methods 
were completed for the two forms of the com- 


Table 2 


Split-half Reliabilities Among Methods 
of Administration 


Method N FormiII 


Form I 
Auditory .76 113 
Read-through a2 122 
Read-study 12 94 
Auditory-read 46 








Comprehension by Reading versus Hearing 


Table 3 


Means, Sigmas, and Ns for Methods 
of Presentation 


Form I 


Mean SD 


35.60 
34.55 
37.04 
37.33 


Method N 


Auditory 118 
Read-through 182 
Read-study 46 
Auditory-read 36 


prehension measures. Significant F ratios be- 
tween methods were obtained in each analy- 
sis. These results are given in Table 4. 

Using the error variance obtained in analy- 
ses for the separate forms, ¢ tests between the 
various methods were made. These results 
are presented in Table 5. 

The results of the statistical analyses may 
be summarized as follows: 


1. The material was equally effectively ac- 
quired by one hearing of the material or by 
one read-through of the material. 

2. Study of the material for a period of 
time equal to that required for one verbal 
presentation is more effective than either the 
verbal presentation or a single read-through. 

3. Simultaneous reading and hearing of the 
material was more effective than either a 
single reading or hearing but no more effec- 


Table 4 


Analysis of Variance by Methods 





Sum of 
Squares df 


Source of 
Variation 


Variance F 





Form I 
394.7 


9189.3 
9584.0 


Between methods 
Within methods 
Total 


Form II 
Between methods 
Within methods 
Total 


1402.7 
5197.0 
6599.7 





* 3.84 significant at .01. 


Table 5 


Table of ¢ Ratios Obtained Between the Various 
Methods of Presentation 


Read- 
Study 


Auditory- 


Auditory Read 


Form I 
Read-through 
Read-study 
Auditory 


Form IT 
Read-through 
Read-study 
Auditory 


* Significant at the .01 level. 
** Significant at the .001 level. 


tive than studying the material for an equal 
period of time. 


Conclusions 


It was noted in the introduction that broad 
generalizations regarding reading versus hear- 
ing are hard to come by. However, one can 
generalize within a given class of conditions. 
Let us recall the conditions of the experiment: 


1. The subjects were college level and 
screened on intelligence (the average ACE is 
approximately 124.64). 

2. The material was in story form with 
considerable detail containing approximately 
2,300 words. 

3. The method of measurement was ex- 
tensive true-false questions covering both de- 
tail and general aspects. 

Under the conditions of the experiment the 
results are quite clear with the finding of the 
two test forms yielding complementary find- 
ings. These results were: 


1. A single read-through of the material 
and hearing the material read once resulted 
in equally effective comprehension. 

2. Studying (reading) the material for a 
period of time equal to the length of time re- 
quired for verbally presenting the material 
resulted in significantly greater comprehen- 
sion when compared with a single read- 
through or auditory presentation. 





240 


3. Reading and hearing simultaneously the 
material was more effective than either read- 
ing the material through once or hearing the 
material but not significantly different from 
the results of studying the material. 


In general, then, for the conditions of the 
experiment, since reading is more rapid for 
one-time acquaintance with material, reading 
is the preferred method. If equal time is 
available for reading as available for audi- 


Wilse B. Webb and Edward J. Wallon 


tory presentation, significantly more informa- 
tion may be obtained by reading. 


Received October 3, 1955. 
t 


References 


1. Bulfinch, T. Mythology. New York: J. M. Vent, 
1931. 

2. Henneman, R. H. Vision and audition as sensory 
channels for communication. Quart. J. Speech, 
1952, 38, 161-166. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Role Perceptions of Successful and 
Unsuccessful Supervisors 


E. E. Ghiselli 


University of California 


and R. Barthol 


Pennsylvania State University 


Ghiselli and Brown have offered a concep- 
tual framework for describing an individual’s 
position in an organization, and his relation- 
ships with others (3). The position and the 
relationships of an individual are described 
in terms of prescribed and perceived roles. 
The prescriptions and perceptions are either 
those of the individual himself or those of 
others in the organization. Following New- 
comb, role prescriptions are thought of as ex- 
pectancies or anticipations of particular types 
of behavior (7). They constitute the formal 
characteristics of the individual’s role either 
as set by himself or by others. Role percep- 
tions, on the other hand, are taken to refer to 
the roles the individual sees himself as actu- 
ally fulfilling or the roles that others see him 
as actually fulfilling. 

One therefore can differentiate four types 
of roles: (a) self-prescribed roles (roles that 
the individual believes he should adopt), (5) 
self-perceived roles (roles that the individual 
sees himself as actually filling), (c) roles pre- 
scribed by others (roles others expect the in- 
dividual to adopt), and (d) roles perceived 
by others (roles others see the individual as 
actually filling). The relationships among 
these roles are of some interest. For example, 
when there is little correspondence among 
roles, difficulties among individuals and groups 
may be anticipated since then individuals will 
not be behaving in expected ways. 

One of these relationships between roles is 
of special importance in the industrial situa- 
tion, that is to say, the relationship between 
self-perceived roles and roles perceived by 
others. This relationship involves the degree 
to which the characteristics an individual be- 
lieves he possesses correspond with the be- 
havior that others believe typify him. There 
is particular significance to this relationship 


241 


when the viewing of the behavior, that is, the 
roles perceived by others, is done by manage- 
ment. This perception is contained either in 
formal statements, e.g., merit or performance 
ratings, or in informal opinions. In either 
event these views will be important deter- 
miners of the kinds of administrative action 
management will take concerning the indi- 
vidual. 

In a sense management is a self-perpetua- 
ting group. By and large it is the sole agent 
in the choice of its members, and it maintains 
or terminates the membership of an indi- 
vidual in it. To be sure, in a hierarchical 
organization, such as an industrial organiza- 
tion, management can be thought of as a se- 
ries of levels. Therefore, it is possible to dif- 
ferentiate ordered groups within management. 
The process of selection and maintenance of 
membership, however, necessarily would be re- 
peated from group to group as an individual 
progresses “up the ladder,” except that the 
top group traditionally has the power to make 
its wishes felt in every group below it. 

One therefore can ask the question, what 
qualities do subordinates see in themselves 
when their behavior is judged by higher man- 
agement to conform with that which manage- 
ment expects of them? In other words, what 
are the self-perceptions of individuals in lower 
management whose behavior higher manage- 
ment perceives as conforming to the stand- 
ards that higher management itself imposes? 

A review of the literature suggests that 
higher management and the workers do not 
agree on the qualities that make a good mid- 
dle management supervisor (4, 6). It fur- 
ther indicates that middle management does 
not necessarily recognize the kind of super- 
vision it is giving to line workers (1). This 
paper limits itself to the viewpoints of the 





242 E. E. Ghiselli and R. Barthol 


line supervisor and his superior and does not 
include the presumably different viewpoint 
of the worker. 


Method and Procedures 


The ways of measuring or describing self-percep- 
tions are many. In recent years adjective check lists 
have seemed to give fruitful results because of the 
ease with which they can be interpreted. Some kind 
of adjective check list, then, was decided upon. 
There are certain problems, however, in the use of 
ordinary adjective check lists. With such lists the 
individual merely accepts or rejects a given adjective. 
Therefore he can reject items that place him in an 
unfavorable light. When there is reason to suspect 
that this tendency will occur among subjects, then 
some other device seems indicated. One way of 
overcoming this difficulty is to use the forced-choice 
method. In this procedure the individual is forced 
to choose between a pair of alternatives that are 
equally desirable or undesirable. By this means re- 
jection of all unfavorable items is avoided. 

One of the present writers developed a forced- 
choice adjective check list which shows considerable 
promise in minimizing the effects of faking (2). 
Hence, this instrument-.was adopted. It consists of 
64 pairs of adjectives, both members of each pair 
referring to traits approximately equal in social de- 
sirability. Thirty-two of the pairs contain adjec- 
tives referring to desirable traits, and the remaining 
32 contain adjectives referring to undesirable traits. 
For the former, the respondent chooses the alterna- 
tive he believes most describes him, and for the 
latter he chooses the one he believes least describes 
him. 

The inventory was completed by 267 persons, all 
of whom were first-line supervisors. In order to ob- 
tain as wide a sample as possible, cases were drawn 
from seven different organizations distributed geo- 
graphically from the far east to the far west. In- 
cluded were four groups of industrial foremen and 
supervisors numbering, respectively 63, 24, 22, and 
20; and three groups of office supervisors number- 
ing, respectively, 91, 26, and 21. 

The persons in each group were rated by their 
superiors. A different rating scale was used in each 
different organization. The numbers of steps on 
these scales ranged from two to sixty, but in every 
case the ratings dealt directly with the degree to 
which the individual was effective in performing his 
job as a first-line supervisor. In some of the organi- 
zations the ratings were made by a single individual, 
while in others they were the average of the ratings 
of two or three persons. The ratings were accom- 
plished either by the supervisors’ immediate superiors 
or by superiors two levels higher. On the basis of 
these ratings by their management, the supervisors 
within each group were divided into high and low 
subgroups. The attempt was made to divide each 
group into half, but owing to the distributions of 
ratings it was impossible to achieve this exactly. 


The final two subgroups formed by all cases com- 
prised 157 cases rated high and 110 cases rated low. 


Results 


Using the two groups of high- and low- 
rated supervisors an item analysis of the 
forced-choice inventory was performed. Of 
the 64 paired adjectives, 18 pairs, or a little 
better than one out of four, differentiated be- 
tween the high- and low-rated supervisors at 
the 5% level of significance or better. These 
pairs are given in Table 1. The word on the 
left was selected by the high-rated supervisors 
and the word on the right was selected by 
the low-rated supervisors. With the forced- 
choice technique, any item that discriminates 
between two groups necessarily consists of 
two alternatives, one applying to the first 
group and the other applying to the second 
group. As indicated earlier, on those items 
involving socially desirable traits the respond- 
ent indicates which alternative he believes 
most characterizes him, and on those items 


Table 1 


Items Differentiating High- and 
Low-Rated Supervisors 








Inferior 
Supervisors 


See themselves as: 


Superior 
Supervisors 


See themselves as: 


energetic ambitious 
loyal dependable 
kind jolly 
planful resourceful 
clear-thinking efficient 
enterprising intelligent 
progressive thrifty 
poised ingenious 
steady sociable 
appreciative good-natured 
responsible reliable 


Do not see 
themselves as 


Do not see 
themselves as: 


noisy 
affected 
shallow 
unstable 
nervous 
opinionated 
self-pitying 


arrogant 
moody 
stingy 
frivolous 
intolerant 
pessimistic 
hard-hearted 





Role Perceptions of Supervisors 


involving socially undesirable traits he indi- 
cates which least characterize him. In Table 
1 these two types of items are grouped sepa- 
rately. 

The adjectives in the first column of Table 
1 give those self-perceptions of superior su- 
pervisors which differentiate them from in- 
ferior -supervisors. The adjectives in the 
second column give the reverse picture. At- 
tempts to form a total picture of an indi- 
vidual or a group from a list of checked 
adjectives inevitably brings about a certain 
amount of disagreement. Nevertheless we 
have attempted to form integrated pictures of 
the self-perceptions of the two groups from 
the adjectives checked by them. 

The “good” supervisor sees himself as ac- 
tive, purposeful, and forward looking. He is 
favorably disposed toward his company and 
identifies himself with his job. He views his 
responsibilities broadly, that is, of having a 
job to do rather than a series of assigned 
tasks. He feels that he must exercise certain 


independence of thought and action: plans 
and decisions are an integral part of his work 
and cannot be left solely to his superiors. 
His orientation toward production is through 


people. He sees himself as respecting the 
rights and dignity of others, but is somewhat 
reserved. He considers himself to be stable 
and to display an evenness of temperament. 
He feels that he is worthy of the respect and 
confidence of others and that other people 
can trust him. One gets the over-all impres- 
sion of maturity and calmness. 

The most outstanding self-perception of the 
“poor” supervisor is his sales 2° »roach to hu- 
man relations. He sees himself as a good fel- 
low who is well liked but he does not show 
any need to understand and respect others. 
His chief concern seems to be the impression 
he makes on others. He seems to have a nar- 
row approach to his job and sees himself as 
being highly skilled in carrying out instruc- 
tions. He gives no indication of leadership 
qualities, but instead relies on his own in- 
genuity and intelligence to complete a job. 
He tends to be self-oriented rather than com- 
pany-oriented; his efforts are for his own ends 
rather than those of the company. He does 
believe, however, that he possesses the quali- 


243 


ties that management could well use to ad- 
vantage. 

These descriptions are in accord with the 
findings that poor supervisors are more pro- 
duction-oriented than are good supervisors 
(5). It is our interpretation that the poor 
supervisor tends to view production as an end 
in itself and as his personal responsibility. 
The good supervisor tends to view produc- 
tion as a means to an end (over-all company 
success) and that his main responsibility is 
working with the people who are the direct 
producers. 


Discussion 


The generalizations drawn from the ob- 
tained results are based on at least two as- 
sumptions: (a) The self-perceptions shown 
by the supervisors are approximately in ac- 
cord with the perceptions of higher manage- 
ment of these same supervisors. That is to 
say, higher management sees the good super- 
visors as having the same qualities that the 
good supervisors see in themselves. The 
same is true for the poor supervisors. (0) 
The differences in the self-descriptions reflect 
the qualities that distinguish the good super- 
visors from the poor supervisors. 

With these assumptions in mind we can 
offer some conclusions concerning the role 
prescriptions of higher management for first- 
line supervisors. Higher management ap- 
proves of those supervisors whose attitudes 
seem to be similar to those traditionally held 
by higher management. We referred earlier 
to the hierarchical groups that comprise an 
industrial organization. This study supports 
the notion that the members of the lower 
echelons who are like the members of the 
higher echelons are most likely to win ap- 
proval. This is probably one of the fac- 
tors that leads toward stability in organiza- 
tions since, although the individual members 
change, the attitudes and approach would 
tend to remain the same. 

Higher management wants the lower level 
supervisors to have initiative and energy. 
The supervisors should be willing to assume 
responsibility, not only for implementing in- 
structions, but for deciding what must be 
done in order to carry out the mission of the 





244 


organization. Management is not looking for 
the old-fashioned driving kind of supervisor 
who bullies his men to do a job, nor for a 
supervisor who tries to operate on the basis 
of popularity and friendliness. The super- 
visor who respects subordinates and super- 
visors alike, and furthermore, who views his 
own self-respect and integrity as important, 
is apparently approved by management. 

It is reasonable to ask why the poor su- 
pervisor is the way he is. He does not will- 
fully try to be a poor supervisor. He per- 
sistently does the wrong thing, and this is 
possibly because he thinks these behaviors are 
expected of him. We assume that there is 
some foundation for his beliefs and that they 
arise from a misinterpretation of the expecta- 
tions of higher management. The authors 
suspect that part of the trouble arises from a 
misunderstanding of the precepts in current 
thinking about proper supervision. Among 
these precepts we might find the following: 
The good supervisor (a) has the good will of 
his subordinates, (5) does his job with intel- 
ligence and ingenuity, (c) is reliable and 
conscientious, (d) wants to succeed, (e) 
“sells” his orders rather than dictates them. 


Presumably no one would quarrel with 


these statements. A re-examination of Table 
1 shows that the poor supervisor sees himself 
as having all of these qualities and yet higher 
management does not approve. Two major 
things seem to be missing: (a) respect for 
other individuals, and () identification with 


E. E. Ghiselli and R. Barthol 


the job. We hypothesize that higher man- 
agement and the good supervisor are ego- 
involved in their jobs, while the poor super- 
visor views it as a way to make a living. 
We further hypothesize that current manage- 
ment training programs frequently mislead 
some supervisors by presenting human rela- 
tions as a combination propaganda and sales 
technique without making it clear that other 
human beings are involved. The dignity of 
the other is fundamental to effective human 
relations. It is perhaps not so much a tech- 
nique as an attitude. 


Received October 11, 1955. 


References 


. Fleischman, E. A. The measurement of leadership 
attitudes in industry. J. appl. Psychol., 1953, 
37, 153-158. 

. Ghiselli, E. E. The 
self-description. 
201-208. 

. Ghiselli, E. E., & Brown, C. W. (2nd Ed.)  Per- 
sonnel and industrial psychology. New York 
McGraw-Hill, 1955. 

. Halpin, A. W. The leadership behavior and com- 
bat performance of airplane commanders. J. 
abnorm. soc. Psychol., 1954, 49, 19-22. 

. Katz, D., et al. Productivity, supervision, and 
morale among railroad workers. Ann Arbor: 
Survey Research Center, Univer. of Michigan, 
1951. 

. Moore, J. V., & Smith, R. G., Jr. Some aspects 
of non-commissioned officer leadership. Per- 
sonnel Psychol., 1953, 6, 427-443. 

. Newcomb, T. M. Social psychology. 
Dryden, 1950. 


forced-choice technique in 
Personnel Psychol., 1954, 7, 


New York: 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Job Expectancy and Survival 


Joseph Weitz 


Life Insurance Agency Management Association 


Introduction 


One problem constantly plaguing research- 
ers in the area of opinion and attitude meas- 
urement is that of causality. In job satisfac- 
tion studies, for example, one is never sure 
whether workers are high producers because 
they are happy or are happy because they 
are high producers (if indeed there is any re- 
lationship). Similarly, where termination- 
survival is used as a criterion, we do not 
know if some unfavorable attitude led to 
termination or if it is a rationalization after 
termination has been decided upon. 

One method of evaluating such relation- 
ships is to study the variables experimentally. 
In some cases it is possible to introduce a 
variable into a situation and see if it has the 
predicted effect. Such is the nature of the 
study reported here. 

In an investigation (1) of job satisfaction 
of life insurance agents, we found that those 
agents who said the manager misrepresented 
the job or job possibilities during the hiring 
interview were more likely to terminate than 
those who did not agree with this statement. 
From other data, we also found that new 
agents having a realistic job concept were 
more likely to survive than those whose job 
expectancy was not as accurate. 

From these two pieces of information, the 
hypothesis was made that when potential 
agents are given a clear picture of their job 
duties, they are more likely to survive on the 
job. 

Procedure 


This study was done in one insurance company. 
A questionnaire was devised asking for the approxi- 
mate amount of time spent in each of a number of 
different job activities such as collecting, servicing, 
prospecting, selling, etc. These questionnaires were 
sent to all agents of the company with a request 
that they be completed and returned to the Life In- 
surance Agency Management Association. The re- 
sults for each question were tallied and the median 
number of hours was computed for each activity. 


All length of service groups were combined since we 
were interested in an approximate composite picture. 

From the results of this questionnaire, a booklet 
was made up consisting of a brief introduction stat- 
ing that the hours shown for each activity in the 
booklet were approximate but should give the ap- 
plicant a fair idea of how he would be spending his 
time if he were hired for the job. The rest of the 
booklet consisted of sketches showing an agent en- 
gaged in each of the various activities, a brief de- 
scription of the activity, and the approximate num- 
ber of hours agents currently employed spent in 
each activity. 

The company supplied us with a list of their dis- 
tricts (offices), the number of agents in each office, 
the number of terminators per district, and the 
number of open debit weeks! for the preceding 
year. Matches were made by district, taking into 
account the geographical location of the district, the 
termination rate, and the number of open debit 
weeks. 

With juggling, we were able to obtain quite good 
matches. The termination rate in each group was 
the same and the average number of open debit 
weeks was 43 and 48. 

By flipping a coin, we decided which group was 
to be the experimental group and which the con- 
trol. It turned out that the group with 52 districts 
was the experimental, and the control group would 
be that containing 51 districts. All applicants in 
the experimental group would receive the job de- 
scription booklet, no one in the control group would 
receive the booklet. 

The mechanics of this for the experimental group 
was as follows: 

All applications for the job of agent go to the 
home office. In the case of those persons filing an 
application in any of the experimental districts, the 
home office would send the following letter to the 
prospective agent: 


“We recently received your application for em- 
ployment as an agent with our company and want 
you to know how pleased we are that you are con- 
sidering a career with company X. It is our feeling 
that the life insurance business offers a fine career 


1A district is composed of a number of debits. A 
debit includes a specified number of policyholders 
living in a particular geographical area (several 
blocks or miles) in which the agent is to collect 
premiums and sell. Each agent has his own debit. 
If an agent terminates, his debit is “open” until a 
new agent is hired in that district for that particu- 
lar debit. The length of time the debit is open is 
measured in what is called “open debit weeks.” Of 
course, some agents may be hired for new debits. 


245 





246 


to the man who is qualified for it. Because of this, 
our responsibility for bringing into our company 
men who have the best chance of succeeding is seri- 
ous and of prime importance. 

“Very likely you are uncertain as to whether you 
should enter the life insurance business. Similarly, 
we are also uncertain as to whether you should or 
not. To the end of fulfilling our responsibility as 
stated above, and helping you make your decision, 
we would like you to read the enclosed booklet. We 
are sending this to your home so that you can study 
it at your leisure and to give you the opportunity 
of discussing it with your family. 

“The booklet describes the job of X company’s 
agent. The company wants you to know in ad- 
vance, insofar as it is possible, exactly the kind of 
work our agents do. Frankly, if this is not the kind 
of work you want to do, we want you to find it 
out now rather than later. If it is the kind of work 
you want to do, well and good. You can discuss 
further the possibility of a position with the man- 
ager who took your application. Either way, your 
action will be based on a clear concept of our job, 
which we feel very deeply is the proper way to make 
a decision. 

“Our best wishes go to you for a successful future, 
whether it be with our company or another organi- 
zation. 

Sincerely,” 


This letter was accompanied by a copy of the 
booklet. 

This procedure, of course, was not carried out 
with applicants in the contro] districts. 


Results 


The study continued for six months, start- 
ing in May and ending in October. Two 
hundred and twenty-six agents were hired 
during this period in the experimental group 
and 248 in the control. Nineteen per cent of 
the agents in the experimental group termi- 


Joseph Weitz 


nated during this period, whereas 27% of 
the control group terminated. This is sig- 
nificant beyond the 5% level using a one- 
tail test. 

More significant perhaps is the fact that 
the differences in termination rate for the 
two groups held up month after month. That 
is to say, if we determine the percentage of 
termination for each group hired in each 
month and exposed until October, we obtain 
the results shown in Table 1. 

As might be expected, the monthly termi- 
nation rate decreases as the last month (Oc- 
tober) of the study is approached. The rea- 
son, of course, is that there is less exposure of 
the men hired later in the study; that is to 
say, they have a shorter time in which to 
quit. For each month, however, it can be 
seen that a higher proportion of the control 
group terminated. Over all, there was a re- 
duction in termination of about 30%, a 
meaningful statistic to a company. 

In order to check on the possibility that 
giving a clear picture of the job to prospec- 
tive agents might make it more difficult to 
hire a man, the proportion of open debit 
weeks (how long it takes to fill a vacancy) 
was determined for the experimental and 
control groups. You might expect that if it 
were more difficult to hire a man who was 
given a clear picture of the job, the experi- 
mental group would have a higher proportion 
of open debit weeks. This was not the case. 
The experimental group had 7.8% open debit 
weeks while the control group had 8.9% open 
debit weeks for the six-month period of the 


Table 1 


Termination Rate for Persons Hired in Each Month of the Study 








Experimental 


Control 





NHired WN Terminated 
Through 


Hired In October 


Terminated 





N Hired N Terminated 
% Through % 
October Terminated 





May 13 
June 11 
July 7 
August 9 
September 2 
October 1 


32 21 47 
34 19 
25 10 

18 

5 

3 








& 





Job Expectancy and Survival 


study. While this difference is not signifi- 
cant, it is opposed to the expected direction. 
We can conclude that the booklet certainly 
did not slow up the hiring procedure. 

If we examine the termination rate in the 
two groups of agents unaffected by the book- 
let, that is, those hired before the start of the 
study, we find that there is no significant dif- 
ference. There were 796 agents on the job 
in the experimental districts and 706 in the 
control districts as of the end of April. We 
determined the termination rate of these “on- 
the-job” agents during the six months of the 
study and found that in the experimental 
group 27% terminated, and 28% terminated 
in the control group. This would lend more 
weight to any differences we find in the 
groups of agents involved in the study since 
apparently our earlier matches held up. All 
in all it appeared that something was effec- 
tive. 


Discussion 


The reason we say it appeared that some- 
thing was effective, rather than the job de- 
scription booklet, is this. The home office 
contact, via the letter accompanying the 
booklet, may have been part of the reason 


247 


the system worked. This procedure perhaps 
created a favorable impression and resulted 
in higher survival in the experimental group. 

This variable could be controlled in further 
studies by issuing the booklet at the point of 
application (but would the manager issue the 
booklet?), or by having the home office send 
out a “public relations” letter to applicants 
without mentioning the job description. 

There are always many things you would 
like to do to “purify” your findings. One 
must not, however, in industrial work, purify 
to the point of sterilization. 


Conclusion 


We feel that this study shows that giving 
prospective agents a realistic concept of the 
job and having this description come from an 
“executive” source will reduce termination. 
We further found that this procedure will not 
make it more difficult to hire new agents. 


Received March 2, 1956. 


Reference 


1. Weitz, J., & Nuckols, R. C. Job satisfaction and 
job survival. J. appl. Psychol., 1955, 39, 294- 
300. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


The Use of a Sentence Completion Test in Measuring 
Attitudes Toward Superiors and Subordinates * 


Leroy S. Burwen 


Research Division, Chicago Tribune 
Donald T. Campbell 
Northwestern University 

and Jerry Kidd * 
Ohio State University 


This paper reports on an effort to use a 
sentence-completion test to measure attitudes 
toward superiors and subordinates which 
might help predict the behavior of an inter- 
mediate in the face of conflicting demands 
from those who supervise him and those 
whom he supervises. 


Method 


The data were collected from Air Force cadets, 
tested on two occasions. The initial testing was 
during the first week of their preflight training at 
Lackland Air Force Base. Some of these same 
cadets, and others, were tested one year later when 
in advanced training. 

The test booklet asked the respondent to “Finish 
the following sentences with the first thought that 
occurs to you. There are no right or wrong an- 
swers. Work as rapidly as possible.” Of the items 
in the test, 24 were deemed scorable for all re- 
spondents on the dimension of superior-subordinate 
orientation. In general, responses favorable to su- 
periors, favoring strict discipline, and the like have 
been interpreted as superior-oriented.* The responses 
were rated on a five-point scale. To convey better 
the nature of the test and its scoring, the 24 items 
are presented here, along with paraphrased responses 
received from the tested population and their rated 
values. These responses have been selected to indi- 
cate the range of responses made, as well as the 
scoring system. 


1 This study was supported in part by the United 
States Air Force under contract No. AF 18(600)-170 
monitored by the Crew Research Laboratory, Air 
Force Personnel and Training Research Center, Ran- 
dolph Air Force Base, Randolph Field, Texas. Per- 
mission is granted for reproduction, translation, pub- 
lication, use and disposal in whole and in part by 
or for the United States Government. 

2 At the time of this study, the first author was at- 
tending the University of Chicago, and the third 
author Northwestern University. 

3 The authors had the advantage of experience 
with a previous scoring of the Lackland tests by 
Desmond Cartwright. 


. In comparing civilian life with army life, he felt: 


restricted in his freedom (2) 
more secure (4) 
content and safe (4) 


. When the commanding officer called him, he 


thought: 
I have fouled up and will get chewed (2) 
general curiosity (3) 
I will be commended or given a pass (4) 


. He liked to be with a leader who: 


was easy-going, was well liked (2) 
knew his job (3) 

knew how to get the work out (4) 
was efficient and strict (4) 


. He never felt comfortable in the presence of: 


his superiors (1) 
a general (2) 
his men (5) 
enlisted men (5) 


. He felt the men over him were: 


boneheads (1) 
harsh (2) 

good leaders (4) 
the best (5) 


. The main trouble with the Air Force is: 


too many chiefs and not enough Indians (2) 
not strict enough discipline (4) 


. Whenever he saw his superior coming he: 


threw up (1) 

ducked or lied (2) 

saluted (3) 

gave him a warm greeting (4) 
was very happy (5) 


. The way to get along in the Air Force is: 


give the guys around you an even break (2) 
work hard (3) 
respect your C.O. (4) 


. He thought the men under him were: 


the best (1) 
foul balls (5) 


. When giving orders to an enlisted man: 


he was kind and understanding (1) 
he expected immediate compliance (5) 





Use of Sentence Completion Test in Attitude Measurement 


. He thought the tough C.O.: 
was psycho (1) 
was the best (5) 

. In an argument with a superior: 
never tell him what you think (2) 
keep your place (4) 
be respectful (4) 

. The average enlisted man: 
is a good Joe (1) 
has a poor deal (2) 
is a slacker (4) 
shows no respect (5) 

. The difference between an enlisted man and an 

officer is: 
just rank, breaks, no difference (2) 
hard work, intelligence, quality (4) 

. When the officer pulled rank on him: 
he thought, what a jerk (1) 
he accepted it gracefully (5) 

. The lot of an enlisted man: 
isn’t too hot (2) 
is just what he makes it (4) 

. The difficulty in being an officer is: 
you can’t associate with enlisted men (1) 
keeping the men in line (4) 

. The status or rank system in the service makes 

for: 
injustice (2) 
better morale, better work (4) 

. When ordered to do something: 
he wanted to know why (2) 
he hopped to it (4) 

. What his men liked most about him was: 
he understood their side (2) 
he was firm (4) 

. A poor officer is one who: 
is “chicken” (2) 
is too slack (4) 

. The men under him disliked: 
the C.O. (2) 
his conduct as an officer (4) 

. When bucking for a promotion: 
don’t step on the other guy’s back (2) 
make sure the right people see you (4) 

. Military regulations: 
are for the birds (1) 
are an absolute must with me (5) 


The ratings of these items had an interjudge re- 
liability of .89. The internal consistency reliability, 
as measured by a variant of the Kuder-Richardson 
formula (6, p. 223) is .69. For 48 men who had 
taken the test both in 1953 as preflight cadets, and 
in 1954 at the advanced training bases, the test- 
retest correlation was .12. 


Results 


At the time the sentence-completion tests 
were administered in 1954, a considerable 


249 


variety of reputational criterion measures 
were also obtained, some from administrative 
records, such as grades and Military Apti- 
tude Ratings, and others from nominations 
data collected by the project. The character 
and interrelationships among these criteria 
have been reported elsewhere (3). Correla- 
tions with all 13 of the available criteria were 
nonsignificant. All but one were below .06. 
The highest value was .13 with Flying Train- 
ing Grade based on an N of 225. This al- 
most reaches significance at the .05 level. 

The Sentence Completion score has also 
been correlated with other attitude measures 
included in the 1954 testing. Its correlation 
with the Leadership Knowledge (5) attitude 
score is .27, based on an N of 312, significant 
beyond the .001 level. The correlation with 
the Superior-Subordinate Cluster (4) is .32; 
with the Alienation cluster (4), — .45; with 
the F scale (1), .01; and with Identification 
with Discipline (4), .25. All values above 
.19 are significant beyond the .001 level. 


Discussion 


The values of .27 with Leadership Knowl- 
edge, and .32 with Superior-Subordinate clus- 
ter, taken with the correlation of .47 between 
the latter two (5), complete a triangulation 
which supports the “construct validity” of all 
three. The significant correlation of Sentence 
Completion and Superior-Subordinate Orien- 
tation Cluster with Identification with Disci- 
pline augments this picture. The high nega- 
tive correlation with the Alienation cluster 
does not help, however, since all but the Sen- 
tence Completion Test are independent of it. 
And, of course, the picture of construct va- 
lidity is weakened by the total absence for 
two of the three tests (Sentence Completion 
and Leadership Knowledge) of significant cor- 
relations with reputational measures intended 
to get at the same dimension. 


Summary 


A sentence-completion test designed to 
measure attitudes toward superiors and sub- 
ordinates was administered to 312 Air Force 
cadets in advanced training. The test was 
scored with acceptable reliability, and showed 





250 Leroy S. Burwen, Donald T. Campbell, and Jerry Kidd 


a correlation of .32 with a direct attitude 
measure of the same dimension, and of .27 
with an indirect measure based on an infor- 
mation test. Interpretation of these values 
is restricted due to a correlation of — .45 with 
a direct scale of alienation, and the absence 
of significant correlations with reputational 
criterion measures. 


Received October 13, 1955. 


References 


1. Adorno, T. W., Frenkel-Brunswik, Else, Levin- 
son, D. J., & Sanford, R. N. The authori- 
tarian personality. New York: Harper, 1950. 


. Campbell, D. T. The indirect assessment of so- 
cial attitudes. Psychol. Bull., 1950, 47, 15-38. 
. Campbell, D. T. Intercorrelations among leader- 
ship criteria on a population of Air Force 
cadets. Unpublished draft research report 
submitted for monitor’s approval. Jan., 1955. 
. Campbell, D. T., Burwen, L. S., & Chapman, 
J. P. Assessing attitudes toward superiors 
and subordinates through direct attitude state- 
ments. Unpublished draft research report 
submitted for monitor’s approval. Jan., 1955. 


5. Campbell, D. T., & Damarin, F. Measuring lead- 


ership attitudes through an information test. 
Unpublished draft research report submitted 
for monitor’s approval. Jan., 1955. 

. Gulliksen, H. Theory of mental tests. New 
York: Wiley, 1950. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


A Validation Study of the Prediction of College Achievement 


J. W. Frick 


University of Southern California 


and Helen E. Keener 


University of California, Santa Barbara College 


The senior author has previously presented 
the findings of an attempt to improve the pre- 
diction of college freshman academic achieve- 
ment by use of the Minnesota Multiphasic 
Personality Inventory (MMPI) in conjunc- 
tion with the usual scholastic aptitude test (1). 
The results indicated that the coefficient of 
determination derived from a weighted com- 
posite of aptitude and personality scores was 
.41, while that derived from aptitude scores 
alone was only .23. A regression equation 
was given for the prediction of grade-point 
average (GPA), based on an experimental 
group of 267 freshman women at the Univer- 
sity of California, Santa Barbara College. 

This study presents the results of an at- 
tempt at cross validation of the previous find- 
ings, in an additional sample of 200 freshman 
women at the same institution. 


Method 


The cross-validation group was selected on the 
same basis as the experimental group, e.g., comple- 
tion of the two semesters of the freshman year, a 
score on the American Council on Education Psy- 
chological Examination (ACE), and a research-valid 
score on the MMPI. Means and standard deviations 
on the relevant variables for the two groups were 
computed and compared (Table 1). Distributions 
and scatter plots of the scores on all variables for 
the validation group indicated a similarity to those 
found for the experimental group, with symmetry 
and linearity in all scales except GPA and the D 
scale of the MMPI. As was done with the experi- 
mental group, these two scales were normalized by 
T-scaling. 

Scores on each predictor variable for each indi- 
vidual of the validation group were then inserted 
into the following previously obtained regression 
equation, derived from the experimental group: 

X’ = 79.6 + .1476 ACE — .5490 Hs — .0125 D 

— 9012 Pd + 1.0127 Pa— 4592 Sc — 4523 Ma. 
In this equation, X’ represents the predicted GPA 
(in T scores) for the individual, while the terms on 
the right side represent the sum of a constant plus 
weighted individual scores on the ACE and selected 
MMPI scales. 

After a predicted GPA had been computed for 
each individual, these predicted scores were corre- 


lated with obtained GPA’s for the entire group of 
200. Since the correlations between GPA and the 
other variables had been corrected for errors of 
measurement in the criterion (GPA) in the experi- 
mental group, this same procedure was followed for 
the cross-validation group. While the reliability of 
the GPA for the two semesters was known for the 
experimental, it was not available for the validation 
group, and was therefore estimated by the method 
suggested by Guilford (2). Since GPA had been T 
scaled for both groups, the SDs were identical, and 
therefore the estimate of reliability (corrected for two 
semesters) was the same for both groups. 

The zero-order correlation between GPA and ACE 
in the experimental group was found to be .48. 
These two measures were also correlated in the 
cross-validation group for purposes of comparison. 


Results 


The original multiple correlation between 
GPA and a weighted composite of ACE and 
MMPI scales, in the experimental group, was 
found to be .64. In the cross-validation 
group, the correlation between predicted and 
obtained GPA’s was .54. In the same group 
the zero-order correlation between GPA and 
ACE was .50. In this group, therefore, the 
coefficient of determination of .25 yielded by 
the ACE scores alone is slightly higher (.02) 
than that found in the experimental group, 
while the coefficient of multiple determina- 
tion of .30 is considerably (.11) lower. The 
shrinkage in correlation from the experimen- 
tal to the validation groups between the pre- 
diction variables and the criterion appears to 
have been subject to the regression phenome- 
non common to predictive equations derived 
by the least-squares method, and probably 
magnified by the use of seven independent 
variables in the equation. 

It will be noted (Table 1) that the mean 
of the predicted GPA for the cross-validation 
group is quite close to the obtained mean, 
while means for the prediction variables in 
both groups are similar enough to warrant the 
assumption that both samples were selected 
from the same population. 





J. W. Frick and Helen E. Keener 


Table 1 


Experimental 
Group 


Group 
Prediction ——_————— 


Cross-Validation 


Means, Standard Deviations, and GPAs, Both Groups 


GPA 
(Cross-Validation Group) 





Variables Mean SD 


ACE 47.3 27.74 
Hs 13.0 2.93 
D* 50.0 10.0 
(18.4) (4.34) (18.2) 

Pd 21.0 3.32 21.2 
Pa 91 2.52 9.4 
Sc 25.0 4.32 25.4 
Ma 18.1 3.75 19.1 
GPA* 50.0 10.0 50.0 
(1.319) (.505) (1.317) 


Mean SD 
48.0 
13.2 
50.0 





10.0 


3.61 


4.31 


10.0 


26.45 
2.87 


(3.22) 
3.12 
4.02 


(.560) 


Meanpredicted Meanobtained SDpredic ted 


SDovtained 


49.4 





* Normalized by T scaling. 


Discussion 

Inspection of the data indicates that in 
some cases there were deviations in obtained 
GPA from that predicted, without concomi- 
tant aberrant scores in any of the prediction 
variables. This was especially true of a group 
of 50 “problem” students, whose prediction 
scores were within the normal range but 
whose performance as measured by GPA was 
extremely poor. Possibly certain personality 
variables exogenous to the areas measured by 
the MMPI are responsible for this deviation. 
The authors would hazard a guess that these 
deviant achievement scores arose from the in- 
ability of some freshman women to adjust to 
the college routine, social difficulties, and 
various other frustrations to which the col- 
lege woman, more than the college man, ap- 
pears to be prone. It is also possible that 
these difficulties had not yet made their pres- 
ence known at the time of matriculation, 
when the ACE and MMPI were administered, 
and therefore did not enter into the predictive 
measures. Since such deviates appear in most 
college populations, however, the authors did 
not feel justified in excluding this particular 
group from the study. 

The expected shrinkage of the multiple- 
correlation coefficient from the experimental 
group to the validation group may be at- 
tributed to (a) the regression phenomenon, 
aggravated by the use of seven prediction 
variables; (b) sampling errors in either or 
both groups; (c) the tendency of the least- 
squares method of computation of the multi- 


Original means and standard deviations in parentheses. 


ple-correlation coefficient to exploit any chance 
relationships present; (d) one-fourth of the 
cross-validation group being widely deviant 
in performance but not in the predictive vari- 
ables. 

Summary 


1. A regression equation derived from the 
ACE and six clinical scales of the MMPI 
in an experimental group of 267 freshman 
women at the University of California, Santa 
Barbara College, was applied in the predic- 
tion of GPA to a similar cross-validation 
group at the same institution. 

2. The multiple-correlation coefficient be- 
tween prediction variables and GPA in the 
experimental group was .64. The correlation 
between predicted and obtained GPA in the 
cross-validation group was .54. Both coeffi- 
cients were corrected for errors of measure- 
ment in the criterion, with the reliability of 
the criterion estimated as the same for both 
groups. 

3. The shrinkage in the coefficient of de- 
termination from the experimental to the vali- 
dation group can be attributed to the regres- 
sion phenomenon, sampling errors, and the 
influence of variables not measured by the 
prediction scales. 


Received August 22, 1955. 


References 


1. Frick, J. W. Improving the prediction of aca- 
demic achievement by use of the MMPI. J. 
appl. Psychol., 1955, 39, 49-52. 

2. Guilford, J.P. Psychometric methods. 
New York: McGraw-Hill, 1954. 


(2nd Ed.) 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Predicting Grade-Point Average with a Forced-Choice 
Study Activity Questionnaire 


Genevieve Schutter and Howard Maher 


Iowa State College 


The importance of study skills and atti- 
tudes in academic achievement has been cited 
by counselors and by those responsible for 
study methods courses. Many attempts at 
measurement of study skills and attitudes 
have been made. A seeming weakness of some 
study tests is item transparency. Where 
items can be answered either yes or no or to 
some degree of applicability, the respondent 
can, if he wishes, indicate that all favorable- 
appearing items apply to him while denying 
unfavorable-appearing items. For situations 
such as college admission, required attend- 
ance in study correction courses, and others 
in which an important outcome may be partly 
determined by a study test score, it would 
seem desirable to have a test not so depend- 
ent upon honesty. For example, Holtzman 
and Brown (2) report a study questionnaire 
consisting of both skill and attitude items. 
While they obtain mean validity coefficients 
of .42 and .45 for men and women, respec- 
tively, their test manual includes the authors’ 
opinion that the predictive validity of the in- 
strument might be affected by the students’ 
desire to do well on the test. Scates (7), in 
a review of Wrenn’s Study Habits Inventory, 
raises another transparency issue. He notes 
that there are easy “outs” for the student. 
The student can check a large number of 
mechanical or external reasons for low grades, 
thus establishing a facade problem for the 
counselor. 

Another area of measurement that has been 
plagued by transparency is personnel rating. 
The forced-choice technique has been demon- 
strated more or less to control the biased re- 
sponse sets usually found with rating scales 
having readily apparent answers (8). Since 
the forced-choice technique appears success- 
ful in reducing transparency and since there 
is obviously little limit to the biasing possible 
on the usual free choice test, the present 
study proposes to investigate the use of the 


forced-choice technique in the study test con- 
text. 


Procedure 


As a first step in the development of the test, 600 
phrases and statements were selected from “How I 
Study” essays written by 150 sophomore students. 
To keep assignments within a reasonable limit, 300 
of these items were randomly selected for use in the 
present study. Next, the 300 items were classified 
independently by six expert judges (three psychol- 
ogy department staff members and three graduate 
students with experience in counseling) into Skill 
and Attitude categories. Items were accepted into 
the categories when agreed upon by at least five of 
the six judges. As a final step, at this stage of con- 
struction, 99 freshman and sophomore students in 
psychology classes were asked to indicate the ex- 
tent to which each statement described them on a 
scale of five degrees. 

Of the students, 50 were overachievers academi- 
cally and 49 were underachievers. They were se- 
lected as follows. A regression line to predict two- 
quarter grade-point average from ACE-L score was 
drawn. This line was based on data from entering 
students in 1953. A scatter plot of students’ ACE-L 
and grade-point averages was made, and the 50 stu- 
dents who were at least seven-tenths of a standard 
error of estimate above the regression line were se- 
lected as overachievers. The 49 underachievers were 
at least seven-tenths of a standard error of estimate 
below the regression line. 

The mean response (on the five-point scale) for 
each item was computed for the high and low 
groups. The algebraic difference between the mean 
of the high group and the mean of the low group 
for each item was designated the discrimination 
index. The mean response of the two groups com- 
bined was designated the preference index for that 
item. 

Thirty blocks of five statements each were as- 
sembled using the preference and discrimination in- 
dices and Skill and Attitude categories. Of the five 
statements, two were equally favorable in appear- 
ance as determined by equal preference values. One 
of these, having a comparatively large discrimination 
index in favor of the high group was designated the 
Favorable Valid Statement; the other having a 
smaller discrimination index was designated the Fa- 
vorable Nonvalid Statement. Two more of the 
statements were equally unfavorable in appearance, 
determined by equally low preference values. One 
of these had a comparatively large negative dis- 


253 





254 


crimination index and was designated as the Unfa- 
vorable Valid Statement. The other had a smaller 
discrimination index and was designated as the Un- 
favorable Nonvalid Statement. The distance be- 
tween the Valid and Nonvalid for both favorable 
and unfavorable statements was never smaller than 
five-tenths of a discrimination index, a number of 
preliminary tests having indicated that for item pairs 
chosen randomly this difference was either statisti- 
cally significant (.05) or closely approached signifi- 
cance. The fifth statement was designated as the 
Neutral Statement and had medium preference index 
and medium discrimination index. It may be recog- 
nized that this procedure constitutes the Richardson 
forced-choice system as described by Highland and 
Berkshire (3). 

In addition to the above method of fitting alter- 
natives into blocks according to their validity it was 
necessary to devise a scheme of arranging the Atti- 
tude-Skill dimensions within the blocks. It was de- 
cided to make this arrangement so that of the two di- 
mensions (Validity—Nonvalidity and Attitude-Skill) 
only one would vary at a time. Thus, the subject 
would be required to operate in only one dimension 
at a time. 

Nineteen blocks were arranged according to the 
scheme in Table 1. 

Seven of the blocks were Form A; 12 were Form 
B. An additional eight blocks were constructed so 
that the Favorable and Unfavorable couplings were 
mixed with respect to S or A. These blocks were 
scorable for total score but not for S or A score. 
Within blocks the statements were arranged alpha- 
betically according to the first letter of the first 
word of the statement. Again, once all blocks were 
constructed, they were randomized for order of ap- 
pearance on the questionnaire by means of tables of 
random numbers. 

The instructions for taking the test required the 
students to pick two statements from each block: 
the one most like their study attitude or practice 
and the one least like their study dttitude or prac- 
tice. 

Three. groups of freshman and sophomore subjects 
designated A, B, C, were selected from psychology 


Table 1 
Scheme for 19 Blocks 


Block Form B 








Block Form A 








Validity Area Validity 


FV* A FV S 
FN A FN S 
N A/S/NA N A/S/NA 
UV S UV A 
UN S UN A 


Area 








* FV = Favorable Valid; FN = Favorable Nonvalid; N = 
Neutral; UV = Unfavorable Valid; UN = Unfavorable Non- 
valid; A = Attitude; S = Skill; NA = No Area. 


Genevieve Schutter and Howard Maher 


classes. Groups A and B were used as validation 
groups. Group C was retained as the cross-valida- 
tion group. Twenty-six per cent of group A, 19 
per cent of group B, and 24 per cent of group C 
were from the original criterion groups which filled 
out the earlier questionnaire. Each group had 50 
students who were below or on the regression line 
to predict grade point from ACE-L score and 50 
who were above, making 100 students in each group 
in a continuous grade-point distribution. The three 
groups were selected so that within each group in- 
dividuals were matched on ACE-L score, sex, and 
distance above and below the regression line. As 
much as possible, individuals were also matched 
among the three groups on the same variables. That 
matching was effective is shown by the fact that, 
by F-ratio test, there were no significant differences 
among the samples A, B, and C as regards grade- 
point average, ACE-L score or interactions. At the 
same time, within groups there was a significant dif- 
ference on grade-point average but not on ACE-L 
score. 

The forced-choice answer sheets for groups A and 
B were used in making the scoring key. The first 
step was to identify statements more often chosen 
by overachievers and, conversely, those more fre- 
quently chosen by underachievers. Consequently, 
for each statement the percentage of persons in the 
upper half of group A who chose it as most like 
them and again as least like them was obtained, as 
was the percentage of persons in the lower half of 
group A choosing it as most and, again, as least like 
them. This same procedure was followed for the 
upper and lower halves of group B. The critical 
ratios of the differences of these percentages of upper 
and lower halves of group A were read from Mosier 
and McQuitty’s nomograph (4). The same pro- 
cedure was followed for group B. These two sets of 
critical ratios were transformed to probabilities and 
the two sets of probabilities combined into a com- 
pound probability via Baker’s Tables (1). Weights 
were assigned to alternatives in the blocks according 
to the following scheme: 





Compound 
Probability 
2% < P% S SY 
1% < P% < 2% 
0% < P% < 1% 


Weight 








Finally, the ten least valid blocks, in terms of num- 
ber of differentiating statements were discarded. 
Positive weights were assigned when the choices of 
an alternative were greater for the upper group, 
negative weights in the opposite condition. 

In an effort to determine the relative importance 
of the mechanics of studying and attitudes toward 
study, the answer sheets for group C were scored 
twice again. One scoring was made for the Skill 
statements and the other for the Attitude statements. 





Predicting Grade-Point Average 255 


Results 


Reliability. The assumption of homoge- 
neity of blocks was not warranted since the 
blocks had different total weights and differ- 
ent proportions of Skill, Attitude, and No 
Area statements. In an effort to obtain more 
equivalent subtests, a modified form of the 
odd-even method of computing reliability was 
used. The blocks were ranked according to 
weights in the three categories of Skill, Atti- 
tude, and No Area; the odd-even blocks of 
this ranked order constituted the two subtests. 
The reliability coefficients were found to be 
.62, .70, and .72 and when stepped up by the 
Spearman-Brown formula were .76, .82, and 
.83 for groups A, B, and C respectively. 

Validity. The validity of the test was com- 
puted by means of a Pearson product-moment 
coefficient of correlation between test score 
and the cumulative grade-point average for 
fall and winter quarters. These coefficients 
were .58, .51, and .36 for groups A, B, and 
C, respectively. As previously indicated, 


weights for the scoring key were based upon 
the responses of groups A and B. Group C, 
the cross-validation group, shows the usual 
effects of shrinkage when the test is applied 


to a new sample. Even so, the shrinkage 
would appear to be slight and the coefficient 
of .36 remains significantly different from 
zero. For N = 100 an of .25 is significantly 
different from zero at the .01 level. 

An important consideration in the validity 
of the study test is its relationship to intelli- 
gence. It will be recalled that, in the con- 
struction of the test, an effort was made to 
match high and low scholastic groups on 
ACE-L score. It was hoped that this match- 
ing would result in a predictive validity for 
the study test independent of intelligence. 
For group C, the cross-validation group, the 
correlation of ACE-L score with grades is .41, 
the study test validity is .36, and the inter- 
correlation of grades and test score is only 
.07. From these figures it would seem that 
the efforts to control for intelligence were 
successful and that the two tests are prac- 
tically independent measures. Some increase 
in this low intercorrelation might be antici- 
pated for future samples where controls for 
ACE-L score are not exerted. For the pres- 


ent group C, however, a combination of the 
two tests predicts grades better than either 
one alone. The two zero-order correlations 
when combined show coefficient R; 2; = .53, 
where: 


1 = Cumulative grade-point average 
2 = ACE-L score 
3 = Forced-choice Study Test 


Thus, while the best single predictor is the 
ACE-L score, the addition of the study test 
raises the prediction by .12 over that of 
ACE-L alone. It would appear, therefore, 
that the use of the study test would make for 
prediction of scholastic achievement over and 
above that obtainable with ACE-L score 
alone. 

Skill and attitude components of the test. 
Since the test was composed of two distinct 
types of items, the skills or mechanics of 
study, and attitudes toward study, it is ap- 
propriate to investigate whether one appeared 
to be more valid than the other. As previ- 
ously indicated, Skill or Attitude alternatives 
were scored only if the other member of the 
pair was in the same (Skill or Attitude) area. 
A coefficient of correlation of .59 between 
scores on the 12 Skill pairings and the 14 
Attitude pairings suggests that the two types 
of items tend to vary together, and that a 
high. score on Attitude is likely to be asso- 
ciated with a high score on Skills. In terms 
of comparative validity also, there would ap- 
pear to be little choice. The correlations 
with grades are found to be .28 and .23 (sig- 
nificant at .05) for Attitude and Skill scores, 
respectively. From_,all data taken together it 
appears that both Skill and Attitude state- 
ments are about equally valid, but are not 
making independent contributions to the total 
validity. 

Other relationships. Are there significant 
score differences between men and women 
and between freshmen and sophomores? To 
determine the extent of relationship of sex 
and class membership with scores, two point- 
biserial correlations were computed. The 
point-biserial coefficient of correlation of test 
score with sex is .13, which is not significantly 
different from zero at the .05 level. It would 
appear that, although no attempt was made 





256 


to control for sex item differences at earlier 
stages of development, no total score sex dif- 
ferences have been introduced into the test. 
The point-biserial coefficient of correlation 
of test score with class (r = .14) was not sig- 
nificantly different from zero at the .05 level. 
The test is thus probably suitable for use 
with either freshman or sophomore groups. 


Discussion 


As indicated previously, the shrinkage from 
validation samples to the cross-validation 
sample appears slight. However, the cross 
validity, in terms of forced-choice scales, is 
somewhat disappointing. Moreover, consid- 
ering the individual blocks, the results are 
not up to par. The probability weights are 
generally below those found with other 
forced-choice scales. Again, the attrition of 
blocks from the questionnaire to the forced- 
choice stage is rather great, 8 of the 30 blocks 
tested carrying no weight for any of the al- 
ternatives. And of the 20 blocks accepted 
for the final form, 3 carry only one weight. 

We are led to the speculation that some of 
this washing out of blocks may be occasioned 
by the nature of the criterion used to select 
block alternatives. In the first place, the ex- 
perimental design poses as a target only that 
portion of the criterion variance not ac- 
counted for by ACE-L score. Furthermore, 
a continuous distribution of under- and over- 
achievement was used as the criterion. Lesser 
prediction might be expected in that some of 
the discrepancy between predicted and actual 
achievement lies close enough to the regres- 
sion line to be merely a “chance” miss. 

Nevertheless, in spite of lessened validity, 
the experimental design is still, in the opin- 
ion of the authors, the preferred one. Fail- 
ure to control for intelligence in the criterion 
may lead to a loading of a study test with 
intelligence items. Again, even though vali- 
dation on continuous rather than extreme 
group criteria may have led to greater block 
attrition, surely, in everyday application of 
a test, the convenience of extreme groups can- 
not be expected. 

Another disturbance may have arisen from 
the method used to assemble the forced-choice 
blocks. The rather small difference between 
the discrimination indices of the valid and 


Genevieve Schutter and Howard Maher 


nonvalid statements was employed in an at- 
tempt to reduce the transparency of the 
blocks. This, conceivably, could also reduce 
the validity of the blocks by “splitting the 
votes” of both good and poor students be- 
tween the valid and nonvalid alternatives. 
Probably, in future investigations, basic re- 
search is needed to indicate the optimal dis- 
crimination distances for validity, reliability, 
and nontransparency. 

Additional reduction in validity may have 
been produced by our attempt to get the atti- 
tude and skill measurements into the scale. 
Too few items were sufficiently agreed upon 
by judges to permit enough blocks to be as- 
sembled under the demands of forced-choice 
construction, i.e., discrimination, preference, 
and, in this case, attitude and skill require- 
ment. Probably a greater number of items 
should have been tried in the original ques- 
tionnaire. 

With all of the above, however, the test as 
it now stands has, at least, usable validity. 
Should the low interrelationship with intelli- 
gence test score hold for future samples, the 
test could be expected to add to the multiple 
prediction of college achievement. Further- 
more, the investigation has demonstrated the 
applicability of the forced-choice instrument 
in, to the best of our knowledge, a new area. 
There is an indication in the data that the 
same techniques used in the construction of 
forced-choice rating scales and tests will 
carry over in this context. For instance, the 
pairing of alternatives on discrimination index 
appeared helpful. Of the 31 weighted alter- 
natives in the final scale, 26 were the dis- 
criminating items when the original question- 
naire was analyzed. This would seem to in- 
dicate that the computation of discrimination 
indices is an important step in design of the 
forced-choice block. That it is not a com- 
pletely sufficient step is evidenced by the 
fact that, had all valid statements been 
scored as originally weighted, at least 40 
statements would have been weighted, i.e., 
FV and UV statements in each of 20 blocks. 
Also had these been weighted for both most 
and least descriptive responses, there would 
have been 80 scorable statements instead of 
the 31 found to be differentiating. 

The data do show, however, one difference 





Predicting Grade-Point Average 


between forced-choice rating scales and study 
tests. In the rating situation the Neutral 
Statement has sometimes functioned as a 
suppressor statement (6), i.e., a favorable- 
appearing item with negative weight for a 
“most descriptive” response or one unfavor- 
able in appearance weighted negatively when 
denied. Only one Neutral Statement served 
as a suppressor in the present investigation 
and carried only unitary weight. It would 
seem likely, therefore, that the suppressor is 
not functional in this context. A possible 
reason lies in the nature of the students’ re- 
sponses; it is clear that they do not resist un- 
favorable alternatives to the extent that raters 
do. These latter deny favorable alternatives 
or admit unfavorable ones less frequently 
(5). In this case 32 per cent of the responses 
are of this nature. Richardson (6) has con- 
tended that the suppressor works by allowing 
the rater to “damn with faint praise.” The 
underachieving student, in describing himself, 
seems to be praising with loud damns, thus 
finding little use for suppressors. 

Finally, the alternatives discriminating good 
from poor students are of some interest. The 
good student generally characterizes himself 
as using most of the techniques recommended 
in methods courses—i.e., trying to distinguish 
important from unimportant points, survey- 
ing before studying, time budgeting, etc. He 
would also appear to have a singleness of 
purpose and a serious attitude toward grades. 
The underachiever, on the other hand, would 
seem to have primarily serious motivational 
difficulties. Secondly, he marks a cluster of 
alternatives indicating that strong social in- 
terests operate to the detriment of his grade- 
point average. In fact, whereas the over- 
achiever would appear to have both high 
motivation and efficient technique, the poor 
student would appear handicapped mainly in 
terms of motivation. 


Summary 


In an attempt to introduce the lesser trans- 
parency of forced-choice technique into the 
study test area, preference and discrimina- 
tion indices were first computed from the re- 
sponses of 99 over- and underachievers to 
300 attitude, skill, and unclassified items. 


257 


Thirty Richardson-type forced-choice blocks 
were next submitted to 300 students. Two 
groups of 100 students each were used as 
item validation groups. The cross-validity, 
obtained on a third group of 100 students 
was found to be r = .36, while the corrected 
odd-even reliability was r= .83. Skill and 
Attitude statements appeared to contribute 
about equally to the validity. The total test 
scores did not correlate significantly with sex 
or class (year) membership. 

Discussion of the results has centered about 
possible reasons for the lesser validity of the 
forced-choice technique in this test as com- 
pared with other areas. The use of a continu- 
ous distribution of discrepancy between ACE- 
L score and college achievement, the method 
of assembling the forced-choice blocks, and 
the reduction of the usable population of 
items by attempts to satisfy many require- 
ments are advanced as hypotheses. Again, 
differences between forced-choice responses 
on rating scales and the test are examined, 
leading to the hypothesis that suppressor al- 
ternatives may not be functional in the lat- 
ter. Finally, the data provide word pictures 
of under- and overachieving students. 


Received October 31, 1955. 


References 


. Baker, P. C. Combining tests of significance in 
cross validation. Educ. psychol. Measmt, 
1952, 12, 300-306. 

. Brown, W. F., & Holtzman, W. H. Brown-Holtz- 
man SSHA manual. New York: Psychologi- 
cal Corporation, 1953. 

. Highland, R. W., & Berkshire, J. R. A meth- 
odological study of forced-choice performance 
rating. USAF Hum. Resour. Res. Cent., Res. 
Bull., 1951, No. 51-9. 

. Mosier, C. I, & McQuitty, J. V. Methods of 
item validation ABACS for item test correla- 
tion and critical ratio of upper-lower differ- 
ence. Psychometrika, 1940, 5, 57-65. 

. Richardson, M. W. An empirical study of the 
forced-choice performance report. Paper read 
at Amer. Psychol. Ass., Denver, Sept., 1949. 

. Richardson, M. W. Forced-choice performance 
reports; a modern merit-rating method. Per- 
sonnel, 1949, 26, 205-210. 

. Scates, D. E. Study habits inventory. (Review) 
In O. K. Buros (Ed.), The third mental meas- 
urements yearbook. New Brunswick, N. J.: 
Rutgers Univer. Press, 1949, 566-568. 

. Sisson, E. D. Forced-choice—the new army rat- 
ing. Personnel Psychol., 1948, 1, 365-381. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Fakability of a Forced-Choice Personality Test Under 
Realistic High School Employment Conditions 


Leonard V. Gordon 
U. S. Naval Personnel Research Field Activity, San Diego 


and Ernest S. Stapleton 


Albuquerque Public Schools 


The relative ease with which the job ap- 
plicant can falsify his responses in the con- 
ventional personality questionnaire has raised 
doubt as to the practical utility of this type 
of instrument in employment situations. It 
has been hoped that the forced-choice tech- 
nique would reduce, to some extent, the abil- 
ity or tendency of the applicant to give a 
more favorable impression of himself under 
these circumstances. 

Two general approaches have been used in 
the development of forced-choice personality 
tests. In one, only certain items are keyed 
in and it is assumed that the applicant who is 
out to make a good impression will have diffi- 
culty in determining what these items are. 
In the other, all items are keyed in, but on 
different scales. The applicant who is moti- 
vated to falsify would be faced with the task 
of identifying these scales and deciding which 
scales are important for the job. 

Longstaff and Jurgensen (3) have reported 
a fakability study on the Classification In- 
ventory, a forced-choice test which uses the 
first of these approaches. A group of stu- 
dents were asked to assume that they were 
applying for a job when taking the Inven- 
tory. At the next class meeting the Inven- 
tory was taken again with the students asked 
to assume that they were applying for voca- 
tional guidance. Means on the self-confi- 
dence key were not significantly different for 
the two administrations, and a correlation 
between scores of .50 was obtained. 

Rusmore (4) used the same design with 
the Gordon Personnel Profile, a forced-choice 
test that has each item keyed in on one of 
four different scales. Means on three of the 

1The opinions and conclusions expressed herein 


do not necessarily reflect the opinions of the Chief 
of Naval Personnel or the Department of the Navy. 


258 


scales were not significantly different between 
the simulated job applicant and guidance ad- 
ministrations. Significant mean differences, 
in favor of the job applicant administration, 
were obtained on the Responsibility scale 
and on the Total score, a measure of the 
number of favorable responses. These in- 
creases, equivalent to about 9 and 8 percentile 
points, respectively, indicated that “individu- 
als have a slight tendency to show themselves 
to better advantage in the Industrial selection 
situation.” Correlations between administra- 
tions for the four traits ranged from .64 to 
.79, indicating that “subjects did not change 
their profile patterns substantially from one 
set of directions to the others.” 

These two studies show, at most, very mod- 
erate increases and fairly high correlations 
between scores in simulated employment and 
guidance situations on forced-choice tests. 
Longstaff and Jurgensen (3), however, in an- 
other simulated situation, found that indi- 
viduals oversold themselves substantially un- 
der a more suggestive set of instructions. 
This influence of instructions in simulated 
employment situations points to the need of 
performing studies of this type under more 
realistic conditions, where any tendencies to 
falsify would be self-induced rather than ex- 
perimentally encouraged. In this manner the 
practical utility of using forced-choice per- 
sonality tests for employment purposes can 
be more fairly judged. 

The present study was performed to deter- 
mine what differences, if any, occur in forced- 
choice personality-test performance under two 
conditions where some actual differential mo- 
tivation to falsify may be assumed to exist. 
The design was similar to that used by Long- 
staff and Jurgensen and by Rusmore, except 





Fakability of a Forced-Choice Personality Test 


Table 1 


Means and Standard Deviations Between Pretest and Retest Scores for the Guidance (NV =88) 
and Employment (N=121) Groups 








Ascendency Responsibility 


Emotional 
Stability Sociability Total 





Guid- Employ- Guid- Employ- 
Statistic ance ment ance 
Pretest mean 2.0 2.9 3.7 
SD 5.2 5.9 5.7 
Retest mean 2.5 3.2 
SD 5.6 6.0 


Mean difference .5 3 
i a 8&8 


** Significant at the .01 level. 


that the Guidance and Employment situa- 
tions had a greater degree of realism. 

The Gordon Personal Profile (1), used in 
the study, is a brief four-factor personality 
test measuring Ascendency, Responsibility, 
Emotional Stability, and Sociability. It also 
yields a Total score which indicates the ex- 
tent to which the individual has selected com- 
plimentary rather than derogatory alterna- 
tives. 


Procedure 


Shortly after the beginning of the second semester, 
all junior and senior students in a small high school 
in Albuquerque 2 were administered the Gordon Per- 
sonal Profile after the following introduction: 


“As part of our guidance program, we are asking 
you to fill out a form called the Gordon Personal 
Profile. We will use the information we get in any 
future counseling that we may do with you. Please 
consider it an addition to our student personnel 
services which we are developing for you.” 

Three months later, about two weeks before the 
close of school, students were informed that appli- 
cations for outside employment would be accepted 
through the municipal school system. This fol- 
lowed an established practice of the Youth Employ- 
ment Service to attempt to place continuing stu- 
dents in summer and part-time jobs and terminal 
students in full-time jobs. Students were asked, in 
their classrooms, whether they wished employment. 
Those who did were given a specially devised em- 
ployment blank on which to indicate the type of 
job desired, lowest salary acceptable, and other perti- 

* The writers wish to express their appreciation to 
Dr. H. Lampman and Mr. G. Keppers of the Al- 
buquerque Public Schools, whose cooperation made 
this study possible. 


Guid- Employ- Guid- Employ- Guid- Employ- 
ance ment ance ance ment 


4.7 5.3 3.6 . 14.0 17.2 
6.4 5.6 6.5 , 15.4 14.9 
4.9 J 4.9 : 17.6 21.3 

16.0 14.2 


3.6 4.1 


nent information. The Personal Profile was then 
readministered as an employment test. The follow- 
ing appeared printed on the employment blank: 

“If you desire employment at the end of the 
school year, and wish assistance in obtaining such 
employment, you are asked to fill in the information 
requested below. You will be asked to take the 
Gordon Personal Profile to provide a second copy 
to be appended to this employment form since the 
original copy is not available for this purpose. In- 
formation obtained from this form and from the 
Gordon Personal Profile will assist us in making 
more effective job placements.” 

Students who indicated that they did not wish 
employment were given a specially devised guidance 
blank to complete, primarily to occupy their time 
while the others were completing their employment 
blanks. They were asked about their attitudes to- 
ward particular school subjects, their educational 
plans, etc. This was followed by readministration 
of the Personal Profile for guidance purposes. The 
following was printed on the guidance blank: 

“You are asked to fill in the information requested 
below for vocational guidance purposes. You will 


. be asked to take the Gordon Personal Profile to pro- 


vide a second copy to be appended to this Voca- 
tional Guidance form, since the original copy is on 
file elsewhere. If at some later time vou wish to 
discuss your vocational problems, information ob- 
tained from this form and from the Gordon Per- 
sonal Profile will be of assistance in providing a 
better understanding of your goals and interests.” 
In all, 209 students, 157 juniors and 52 seniors, 
completed the test on both administrations. The 
Employment group contained 121 students, 65 boys 
of mean age 17.6 years and 56 girls of mean age 
17.5 years. The Guidance group contained 88 stu- 
dents, 34 boys of mean age 18.4 years and 54 girls 
of mean age 17.9 years. The first administration of 
the test was performed by the regular classroom 
teachers. The retest was performed by the Direc- 
tor of Guidance to provide a greater sense of realism. 





Leonard V. Gordon 


Table 2 


Correlations Between Scores on the Pretest and Retest for the Guidance (V =88) 
and Employment (V=121) Groups 








Ascendency 


Responsibility 


Emotional 


Stability Sociability Total 





(guidance) (guidance) .80 
*(guidance) (employment) 80 


84 87 84 
68 74 79 





Results 


Means and standard deviations for all 
traits measured, as well as Total score, and 
tests of significance of differences between 
means for the first and second administra- 
tions are presented in Table 1.° It may be 
seen that both the Employment and Guid- 
ance groups increased their means signifi- 
cantly in the retest on Responsibility and 
Total score. In addition, the Employment 
group had a significant mean increase on 


8Since no significant sex differences in changes 
from the first to the second administration were 


Emotional Stability and the Guidance group 
on Sociability. 

Correlations between the scores for the 
first and second administrations are presented 
in Table 2. These correlations, representing 
test-retest reliabilities for the Guidance group, 
with a _ three-month interval intervening, 
range from .80 to .87. The correlations be- 
tween the guidance and employment admin- 
istrations range from .68 to .80 for the 
Employment group. These correlations are 
significantly larger, at the 5% level, for the 
Guidance group on Responsibility and Emo- 
tional Stability. 


noted, a single analysis is reported for both sexes. Using the Guidance group as a control 


Table 3 


Analysis of Variance, with Covariance Adjustments for Guidance Pretest, of Guidance and 
Employment Final Test Administrations 





Adjusted 
Pretest 
Source df SS 
39.10 
6585.14 
6624.24 


19.71 
5594.22 
5613.93 


Mean 
df Square F 
19.06 65 1 
7105.83 2593.06 206 
7124.89 2593.71 207 


138.15 
5951.64 
6089.79 


Product 
SS 


Retest Adjusted 
SS SS 


Trait 








27.30 
5451.36 
5478.65 


Ascendancy Between 1 
Within 


Total 208 


Between 1 
Within 207 
Total 208 


Responsi- 
bility 


52.17 
4344.19 
4396.37 


55.53 
5808.41 
5863.94 


68.76 1 
2578.15 206 
2646.91 


16.81 
7480.15 
7496.96 


Between 1 
Within 207 
Total 208 


Emotional 
Stability 


183.47 
6990.45 
7173.92 


7.12 
9151.10 
9158.22 


728.74 
46884.82 
47613.56 


107.12 
2480.16 
2587.29 


Sociability 65.31 
8329.89 


8395.20 


Between 1 
Within 207 
Total 208 


—21.57 
7148.71 
7127.14 


630.21 
38558.95 
39189.15 


91.53 
3016.08 
3107.61 


6.25* —1.41 


545.00 
47726.52 
48271.52 


* Significant at the .05 level. 
** Significant at the .01 level. 


Between 1 
Within 207 
Total 208 


65.42 
15732.49 
15797.91 











Fakability of a Forced-Choice Personality Test 


group, an analysis of covariance was run for 
the four traits and Total score to determine 
whether the increases in mean score for the 
Employment group was greater than might 
be expected from a guidance retest. 

Analysis of covariance data are presented 
in Table 3. An inspection of the first F 
column indicates that the two groups cannot 
be said to differ in initial mean score on any 
of the traits or the Total score, since none of 
these F tests are significant. 

When the retest means are adjusted for 
initial mean differences between the Employ- 
ment and Guidance groups, the Employment 
group is found to have a significantly greater 
mean on the retest than the Guidance group 
on Responsibility and Emotional Stability as 
indicated in the second F column in Table 3. 
The Guidance group has a significantly greater 
mean than the Employment group on Socia- 
bility. There is no difference in the retest 
performance of the two groups on Ascend- 
ency or Total score. 

The magnitude of the significant differences 
between the adjusted final means is 1.2 for 
Responsibility, 1.5 for Emotional Stability in 
favor of the Employment group, and 1.4 for 
Sociability in favor of the Guidance group. 


Discussion 


The significant increase in means on Re- 
sponsibility and Sociability for the Guidance 
group is somewhat surprising. The group is 
not atypical in that the pretest mean scores 
do not differ significantly from those of the 
Employment group on any of the traits and 
are similar to those reported for a national 
standardization sample of high-school stu- 
dents (6). Windle (5), in his review of test- 
retest effects on personality questionnaires, 
found mean increases in the direction of bet- 
ter adjustment to be fairly common on the 
retest. This occurred even when external 
variables had not been postulated as operat- 
ing to effect test-retest differences. Windle 
mentioned a number of possible intrinsic fac- 
tors that may have accounted for this phe- 
nomenon, but indicated that insufficient evi- 
dence is available to enable him to choose 
from among them. The writer must take the 
same position in being unable to account for 


261 


the increase made by the Guidance group in 
the present study. 

The Employment Group shows significant 
mean increases over the Guidance group 
equivalent to about 8 percentile points in Re- 
sponsibility and about 10 percentile points in 
Emotional Stability. The magnitude of the 
obtained correlations between the guidance 
and employment administrations indicates 
that individuals do not substantially change 
their relative positions on the traits from the 
Guidance to the Employment testing. In 
general these results, obtained in a realistic 
high-school guidance and employment setting 
are very similar to those reported by Rus- 
more, for the same test, using simulated con- 
ditions with college students. 

In evaluating the present findings, two lim- 
iting factors should be noted. First, since 
the students knew that their original scores 
were on file elsewhere at the time they were 
retested on the Personal Profile, they may 
have been inhibited from faking as much as 
they might have in an initial employment 
administration. Secondly, since the students 
were not candidates for a specific job, but 
rather were indicating desires for particular 
types of work, their motivation to falsify 
might have been reduced or less _ specific. 
Thus, while these results were obtained under 
one type of realistic employment conditions, 
a note of caution should be maintained re- 
garding their generality. The fakability of 
the present test, or forced-choice tests of its 
type, under actual industrial employment 
conditions remains to be determined. 


Summary 


1. The Gordon Personal Profile was ad- 
ministered to junior and senior students in a 
small high school for vocational guidance 
purposes. Three months later, at the end of 
the school year, the test was readministered 
to those students applying for jobs as an em- 
ployment test. Students not wishing jobs 
were given the test again as a guidance test. 

2. Using the Guidance retest group for con- 
trol purposes, significant increases of about 8 
percentile points on Responsibility and about 
10 percentile points on Emotional Siability 
were obtained by the Employment group. 





262 Leonard V. Gordon 


The Guidance group obtained a statistically 
significant increase over the Employment 
group in Sociability, equivalent to about 9 
percentile points. 

3. For the Employment group, correlations 
between scores on the guidance and employ- 
ment administrations ranged from .68 to .80. 

4. Thus, individuals did not change their 
profile patterns substantially from a guidance 
situation to an employment situation and 
mean increases for the group were found to 
be moderate. Since the present study was 
performed in a high school situation, how- 
ever, the generality of these findings to actual 
industrial selection remains to be determined. 


Received July 14, 1955. 


References 


. Gordon, L. V. Gordon Personal Profile. Yonk- 


ers, New York: World Book, 1953. 


. Gordon, L. V. Manual, Gordon Personal Profile. 


Yonkers, New York: World Book, 1953. 


3. Longstaff, H. P., & Jurgensen, C. E. Fakability 


of the Jurgensen classification inventory. J. 
appl. Psychol., 1953, 37, 86-89. 


. Rusmore, J. T. . Fakability of the Gordon per- 


sonal profile. J. appl. Psychol., 1956, 40, 175- 
177. 


. Windle, C. Test-retest effect on personality ques- 


tionnaires. Educ. psychol. Measmt, 1954, 14, 
617-633. 


. World Book Company. Special report to com- 


munities that participated in the cooperative 
testing program. Yonkers, New York: World 
Book, 1953. 





The Journal of Applied Psycholog 
Vol. 40, No. 4, 1956 ° 


A Technique for Increasing the Reproducibility of 
Cumulative Attitude Scales * 


Allen L. Edwards 
The University of Washington 


Various procedures have been described for 
improving the reproducibility of cumulative 
scales designed to measure attitudes and opin- 
ions (2, 3, 6,7). The present study reports 
upon the degree of reproducibility obtained 
when the method of paired comparisons is 
used in conjunction with a set of opinion 
statements with known scale values on a fa- 
vorable-unfavorable psychological continuum. 

Assume that N statements with respect to 
some issue have been scaled by the method 
of equal-appearing or successive intervals so 
that a scale value representing the degree of 
favorability of each statement is known. A 
smaller set of m statements is now selected 
from the larger group of N statements in such 
a way that the scale separations of the state- 
ments are approximately equal. In equal- 
appearing interval scales, respondents are 
presented with the set of m statements and 
asked to check whether they agree or disagree 
with each one. Scores on such scales are ob- 
tained by finding the median or mean of the 
scale values of the statements agreed with. 
It has been found, however, that attitude 
scales of the Thurstone equal-appearing in- 
terval variety, in general, have low coeffi- 
cients of reproducibility (2). 

Suppose, however, that each of the n state- 
ments is paired with every other statement, 
as in the method of paired comparisons. In 
each pair of statements, one statement will 
have a higher, or more favorable, scale value 
than the other. Let the statement with the 
higher scale value in each pair be designated 
as A and the statement with the lower scale 
value as B. These pairs of statements com- 
prise the items in the attitude scale to be 
evaluated. Respondents are asked to choose 
the statement, A or B, in each pair that best 


1 This research was supported by a grant from the 
Agnes Anderson Research Fund, Graduate School, 
University of Washington, providing for the sta- 
tistical analyses which were carried out by Doris 
Dietze. 


indicates how they feel about the issue under 
consideration. Scores are obtained by count- 
ing the number of times the respondent has 
chosen the more favorable or A statement in 
the set of m(m — 1)/2 paired comparisons. 

It may be hypothesized that a respondent’s 
choice in each of the AB pairs will be a func- 
tion of his own position on an unfavorable- 
favorable attitude continuum corresponding 
to the one on which the statements have been 
scaled. He will choose, in other words, that 
statement in each pair that is closer to his 
own position. The respondent’s position is, 
of course, unknown, and is to be determined 
from the choices he makes when confronted 
with the AB pairs of statements. If a re- 
spondent falls exactly half way between the 
scale values of a given AB pair, his choice 
should be a matter of chance and all such 
choices will contribute to the unreliability of 
the scores obtained from the scale and also 
reduce the degree of reproducibility of the 
item responses from the scores. 


Description of the Scale 


A set of opinion statements relating to the 
introductory course in general psychology 
had been scaled by the method of equal-ap- 
pearing intervals and two Thurstone-type at- 
titude scales of 20 statements each had been 
developed by a class in the techniques of atti- 
tude-scale construction at the University of 
Washington in 1948. Equal-appearing inter- 
val scale values for the 40 statements were 
thus available. From the set of 40 state- 
ments, 9 statements were selected with scale 
values of 8.7, 7.8, 6.8, 5.8, 4.9, 4.1, 3.0, 2.0, 
and 1.0. High scale values correspond to the 
favorable end of the equal-appearing interval 
continuum and low scale values to the unfa- 
vorable end. 

Each of the 9 statements was paired with 
every other statement to give 9(9 —1)/2 = 
36 pairs of AB statements or items. The 


263 





264 


pairs of statements in the scale were ar- 
ranged so that for the odd-numbered pairs 
the first statement was the A, or more favor- 
able, statement. For the even-numbered pairs 
the second statement was the A, or more fa- 
vorable, statement. This arrangement was 
for scoring convenience and there is no evi- 
dence to indicate that the students subse- 
quently given the scale were aware of the 
ordering of the pairs of statements. 


Procedure and Results 


The scale was given to approximately 370 
students in the introductory psychology course 
at the University of Washington during the 
last two weeks of the spring quarter in 1953. 
The students were asked to choose the state- 
ment in each of the 36 pairs that best ex- 
pressed how they feli about the introductory 
course. They were not asked to sign their 
names to their papers in order to provide as- 
surance that their responses would have no 
influence on their grades in the course. 

Some students failed to respond to every 
item in the scale and their papers were dis- 
carded, leaving a total of 349 papers. These 
papers were divided into two groups of 175 
and 174 by taking alternate papers. All sta- 
tistical analyses to be reported were done 
with the first group of 175 papers and the re- 
sults then checked with the second group of 
174 papers. 

The 175 papers in the first group were 
scored by giving one point each time the stu- 
dent chose the more favorable or A statement 
in the 36 AB pairs. For each of the 36 pairs 
of statements, the proportion of favorable or 
A responses was then found by counting the 
number of students choosing the A statement 
and dividing by the total number of students. 
The items or pairs of statements were then 
arranged in rank order of the proportion of 
favorable responses and the predicted re- 
sponse patterns for each score were deter- 
mined in the manner described by Edwards 
(1). 

An error of prediction was counted each 
time an observed response to a given item 
failed to correspond to the predicted response 
for that item in terms of the score on all 
items. Predictions were made for a total of 


Allen L. Edwards 


(175) (36) = 6,300 responses, with 711 being 
in etror. The proportion of errors was .113 
and the coefficient of reproducibility was 
equal to 1 — .113 or .887. The coefficient of 
reproducibility of .887 obtained with this set 
of 36 items compares favorably with the co- 
efficients of reproducibility customarily re- 
ported for attitude scales with many fewer 
items. 

For the same set of 36 items, the Kuder- 
Richardson (6), formula 20, estimate of reli- 
ability was obtained. This coefficient was 
.869 and it also compares favorably with the 
reliability coefficients, reported by Edwards 
and Kenney (4), for attitude scales con- 
structed by the method of equal-appearing 
intervals and the method of summated ratings. 

In order to check the results obtained and 
reported upon above, the second set of 174 
papers was scored. The response patterns 
predicted for each of the scores in this group 
of papers were based upon the proportions of 
favorable responses obtained in the first set 
of 175 papers. The errors of prediction were 
thus obtained independently of any consid- 
eration of the proportions of favorable re- 
sponses given by the members of the second 
group. For the second group the proportion 
of errors was .121 and the coefficient of re- 
producibility was 1—.121 or .879. The 
Kuder-Richardson estimate of reliability for 
the second set of papers was .883. 

It has been found previously that state- 
ments with scale values in the “neutral” or 
middle section of the favorable-unfavorable 
equal-appearing interval continuum tend to 
contribute to error and thus to lower repro- 
duciblity more than statements scaled to- 
ward the two extremes of the continuum (2). 
For this reason, it seemed worth while to 
check upon the value of the coefficient of re- 
producibility obtained when the two state- 
ments with scale values of 5.8 and 4.1 were 
eliminated from the set of 9 statements. Us- 
ing only the 7(7 — 1)/2 = 21 paired com- 
parisons, the two sets of 175 and 174 papers 
were rescored. Response patterns and errors 
of prediction for the first group of 175 pa- 
pers were obtained as before. The coefficient 
of reproducibility for the 21-item scale was, 
as expected, somewhat higher and equal to 





Reproducibility of Cumulative Attitude Scales 


.914 for the first set of papers. The Kuder- 
Richardson estimate of reliability was .829. 

Using the proportions of favorable responses 
given in the first set of papers and the re- 
sponse patterns based upon these proportions, 
the errors of prediction for the second set of 
174 papers were obtained. For the second 
set of papers the coefficient of reproducibility 
was .904 and the Kuder-Richardson estimate 
of reliability was .861. 


Summary 


The results reported would seem to indicate 
that using the method of paired comparisons 
in conjunction with a set of opinion state- 
ments with known scale values on a favor- 
able-unfavorable continuum has promise for 
the construction of attitude scales with a rela- 
tively high degree of reproducibility and 
satisfactory reliability. 


Received November 10, 1955. 


References 


. Edwards, A. L. On Guttman’s scale analysis. 


Educ. psychol. Measmt, 1948, 8, 313-318. 


. Edwards, A. L., & Kilpatrick, F. P. Scale analy- 


sis and the measurement of social attitudes. 
Psychometrika, 1948, 13, 99-114. 


3. Edwards, A. L., & Kilpatrick, F. P. A technique 


for the construction of attitude scales. J. 
appl. Psychol., 1948, 32, 374-384. 


. Edwards, A. L., & Kenney, Katherine C. A com- 


parison of the Thurstone and Likert tech- 
niques of attitude scale construction. J. appl- 
Psychol., 1946, 30, 72-83. 


. Kuder, G. F., & Richardson, M. W. The theory 


of the estimation of test reliability. Psycho- 
metrika, 1937, 2, 151-160. 


. Loevinger, Jane. The technic of homogeneous 


tests compared with some aspects of “scale 
analysis” and factor analysis. Psychol. Bull. 
1948, 45, 507-529. 


. Stouffer, S. A., Borgatta, E. F., Hays, D. G., & 


Henry, A. F. A technique for improving 
cumulative scales. Publ. Opin. Quart., 1952, 
16, 273-291. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


The Relationship Between Item Ambiguity and 
Discriminating Power in a Forced-Choice Scale * 


Eleanore S. Isard 


Temple University 


Controlling bias or misrepresentation in 
self-rating scales is a perennial problem in 
the field of measurement. One of the newer 
techniques—the forced-choice—attempts to 
handle this problem through a procedure of 
item collection based upon statements made 
by representative elements of the population 
for whom the scale is being built, and through 
pairing of items on the basis of two computed 
indices: Preference and Discrimination. 

Item collection for a forced-choice scale 
was discussed by Gekoski and Isard in a re- 
cent note on a new use of the sentence com- 
pletion technique (2). The problem of pair- 
ing items on the basis of equal apparent 
favorableness or social acceptability (Prefer- 
ence Index) and significance for the criterion 
(Discrimination Index) has received some 
attention in the literature. These reports 
have dealt essentially with the discriminat- 
ing power of positive versus negative or so- 
cially acceptable versus socially unacceptable 
items (3, 4, 6,7). While this paper also con- 
cerns itself with the relationship between 
these two indices, the primary interest is 
with the discriminating power of so-called 
ambiguous items. 


Procedure 
Preference Indices 


Through the use of the essay method (8, 9), 
modified by critical incident (1), and the sentence 
completion method (2), a Student Questionnaire of 
284 statements of opinion toward school experience 
was constructed. Following several preliminary in- 
vestigations into the effects of size of sample and 
wording of instructions on item preference indices, 
the Student Questionnaire was administered to four 
samples of college freshmen and sophomores (total 
N= 84). These students were instructed to rate 
each item on a five-point scale of social acceptability 


1 This paper is based in part on the writer’s Ph.D. 


dissertation at Temple University (5). The writer 
is indebted to Drs. Roy B. Hackman, Norman 
Gekoski, and Harold C. Reppert for their support 
and encouragement throughout this study. 


ranging from 1 (Highly Acceptable), 
(Neutral), to 5 (Highly Unacceptable). 

A Preference Index was obtained for each item by 
computing the mean scale value from the responses 
of the total sample of 84 students. Prior to this, 
application of t tests of the significance of the dif- 
ferences in mean preference indices obtained with 
each of the four subsamples indicated that the re- 
quirement of stability had been met. Items having 
a mean scale value (Preference Index) of 3.00 + .50 
were designated Neutral; those below these limits, 
Positive; and those above, Negative. The latter 
terms are used interchangeably with “Socially Ac- 
ceptable” and “Socially Unacceptable,” respectively. 


through 3 


Discrimination Indices 


Additional samples were collected from the same 
college population for obtaining discrimination in- 
dices. The instructions in this step were that the 
students respond to each statement in the 284-item 
questionnaire only in terms of whether they per- 
sonally agreed or disagreed with it. The validating 
criterion was one-semester grade-point averages com- 
puted from the official transcripts of the participat- 
ing students. Two criterion groups, each containing 
50 subjects equated on college aptitude test per- 
centile rank, were established: (a) Achievers—those 
with grade-point averages of 2.00 (“C”) or higher, 
and (b) Nonachievers—those with grade-point av- 
erages of less than 2.00. Phi coefficients based on 
the item-analysis data for the 100 subjects were com- 
puted as indices of item validity. A phi coefficient 
of .26 or higher was tentatively established as the 
criterion for inclusion of a discriminating item in 
the forced-choice inventory. It was found that this 
represents the very rigorous standard of discrimina- 
tion at the .01 level based upon the chi-square test 
of significance. 


Forced-Choice Scales 


By matching discriminating items with nondis- 
criminating items on the basis of equal Preference 
Index (plus or minus .10), it was possible to con- 
struct two forced-choice inventories, Form AA and 
Form ML. These differed in instructions and scor- 
ing procedure. Form AA consisted of 15 tetrads 
and had an approximately equal number of Positive, 
Neutral, and Negative items. The subjects were in- 
structed to select from each tetrad the two items 
with which they Most Agreed. Form ML was a 
12-tetrad inventory consisting almost exclusively of 
so-called Neutral items. On this form, the students 


266 





Item Ambiguity and 


were instructed to select from each tetrad one item 
with which they Most Agreed and one with which 
they Least Agreed. An example of one tetrad from 
each of the forms appears below. For the reader’s 
convenience, preference and discrimination indices 
are presented beside each statement. The discrimi- 
nating items are asterisked. 


Discrimi- 
nation 
Index (¢,) 


Prefer- 
ence 
Index 


1.89 


Form AA 

. College standards should be 
at a level that will produce 
good students. 

. Reading and studying should 
be taught as subjects to col- 
lege freshmen. 

. Textbooks should be written 
so that the average student 
can understand them with- 
out help. 

. The ideal test requires an 
application of and 
knowledge to practical situ- 
ations. 


facts 


Form ML 

. If a student feels he has not 
been given a fair deal in a 
test, he should take the mat- 
ter up with the Dean. 

. Most instructors prefer essay 
type tests so that they can 
have more leeway in mark 
ing them 

. Most students will try to get 
away with as little work as 
they can. 

2.86 


19 . Most textbooks are too long 


and dull. 


Both forms were administered to two groups of 
30 Achievers and 47 Nonachievers. Mean scores 
were computed for each group on each form. As 
a result of the findings, further study of Form ML 
was undertaken with a new sample of 100 college 
freshmen and sophomores. In addition to retesting 
for reliability, a biasability study was performed 
with 39 highly motivated volunteer subjects. The 
instructions to bias were as follows: 

“Assume that the score you now obtain on this 
inventory will determine whether or not you are 
permitted to remain in college. In selecting your 
answers, be guided by the assumption that the Uni- 
versity will keep only those students who have atti- 
tudes toward the administration, the instructors, the 
student body, etc., that are like those that good stu- 
dents have expressed. Therefore, select from each 
tetrad, as being the item with which you most agree 
and the item with which you least agree, those which 
will place you in the most favorable light with the 


Discriminating Power 


Table 1 
Preference Index Distribution of Discriminating and 
Nondiscriminating Items in the Student 
Questionnaire of Attitudes Toward 
School Experience 
Preference Category 
Neu- 
tral 


33 
67 


Nega 


tive 


Posi- 

tive 
11 
94 


Items 


12 5 
67 2 


Discriminating 
Nondiscriminating 


Total 105 100 79 284 


University, ie., those that you feel will agree with 
the key based upon the attitudes of good students.” 


Discussion and Results 


Table 1 shows the number of discriminat- 
ing and nondiscriminating items found in the 
Student Questionnaire for each of the three 
preference categories. 

In Table 1, x? equals 17.8264, which is sig- 
nificant at the .01 level. This indicates that 
there is a highly significant relationship be- 
tween type of item (Preference Index) and 
discriminating power. It should be ‘pointed 
out that half of the contribution to y* comes 
from the Neutral category for discriminating 
items. In terms of percentages, of the 56 
discriminating items, 20% had been perceived 
as Socially Acceptable and 21% as Socially 
Unacceptable. The remaining 59% were, on 
face value, apparently Neutral. 

The next logical step appeared to be an 
examination of the graphic item counts for 
the Neutral category. This examination re- 
vealed that these so-called Neutral items had, 
in fact, not been rated “3” by the majority of 
subjects but, rather, had been assigned rat- 
ings ranging from Highly Acceptable (‘1’) 
to Highly Unacceptable (“5”). It would ap- 
pear, then, that these statements, perceived 
by some as socially acceptable, by others as 
neither acceptable nor unacceptable, and by 
still others as socially unacceptable, might 
more accurately be labeled Ambiguous. 
Therefore, it might be assumed that a pro- 
jective principle was operating.? 

2The possibility of “unclear” personality items 


serving as “miniature projective tests” was suggested 
by Gordon (3). 





268 


The results of the preliminary investigation 
of Form AA indicated that it was not doing 
the job for which it had been designed. The 
mean score obtained by the achievers was 
the same, within a fraction of a point, as that 
obtained by the nonachievers. Form ML, 
on the other hand, appeared to show promise 
in the preliminary run. In the subsequent 
study, this form was found to have substan- 
tial validity (Tpiseriar Of .66 and .61 for test 
and retest, respectively) with equated sam- 
ples of 50 achievers and 50 nonachievers in 
the test situation and 46 achievers and 46 
nonachievers in the retest situation. Test- 
retest reliability was .76 + .04; SEmoas. Was 
3.17, with the possible total score range of 
— 24 to +24. An item analysis of Form 
ML revealed that, in general, items which 
discriminated in the questionnaire format 
either held up or became more valid in this 
forced-choice format. 

The results of the biasability study with 
Form ML did not warrant detailed statistical 
treatment, the mean difference between scores 
obtained under standard instructions and 
those obtained under instructions to bias 
being — 0.36. 


Summary 


The study reported here was based, in part, 
upon a larger one which concerned itself with 
the development of a forced-choice inventory 
of attitudes for predicting scholastic achieve- 


ment in college. The purpose of this paper 
was to report the findings on the relationship 
between Discrimination Indices and Prefer- 
ence Indices, with special emphasis on am- 
biguous items. The results may be summa- 
rized as follows: (a) In questionnaire format, 
Ambiguous statements were more valid than 
either Positive or Negative statements for 
differentiating college achievers from non- 
achievers. (b) In general, the validity of 
Ambiguous items either held up or increased 
in forced-choice format. (c) The 12-tetrad 
inventory consisting almost exclusively of 
Ambiguous items was found to have substan- 


Eleanore S. Isard 


tial reliability and validity for the purpose 
used, and did not appear to lend itself to 
willful misrepresentation on the part of the 
subjects. It was suggested that, in the use 
of Ambiguous statements of opinion toward 
school experience, a projective principle is 
called into operation. Furthermore, it is 
highly probable that the very ambiguity of 
the statements accounts for the failure of the 
subjects to intentionally bias (i.e., increase) 
their scores. 

In addition to more cross-validation stud- 
ies, structured interviews with the students 
who took part in the study may help to shed 
more light on the nature of the statements as 
well as on the reasons for the responses they 
evoke. Implications for counseling or atti- 
tudinal orientation are manifest. 


Received February 13, 1956. 
Early Publication. 


References 


. Flanagan, J. C. The critical incident technique. 
Psychol. Bull., 1954, 51, 327-358. 

. Gekoski, N., & Isard, Eleanore S. Note on an- 
other use of the sentence-completion tech- 
nique. J. appl. Psychol., 1955, 39, 139. 

. Gordon, L. V. Some interrelationships among 
personality item characteristics. Educ. psy- 
chol. Measmt, 1953, 13, 264-272. 

. Highland, R. W., & Berkshire, J. R. A meth- 
odological study of forced-choice performance 
ratings. USAF, Personnel Train. Res. Cent., 
Res. Bull., 1951, No. 51-9. 

. Isard. Eleanore S. The development of a forced- 
choice inventory of attitudes toward school 
experience for predicting scholastic achieve- 
ment in college. Dissertation Abstr., 1955, 
15, No. 8. 

. Lanman, R. W., & Remmers, H. H. The “prefer- 
ence” and “discrimination” indices in forced- 
choice scales. Educ. psychol. Measmt, 1954, 
14, 541-551. 

. Parris, H. L. A comparative study of forced- 
choice and check-list ratings of Air Force 
R.O.T.C. instructors. Unpublished doctor’s 
dissertation, Ohio State Univer., 1951. 

. Rundquist, E. A. The forced-choice technique 
and rating scales. Paper read at Amer. Psy- 
chol. Ass., Philadelphia, September, 1946. 

. Sisson, E. D. Forced-choice—the new Army rat- 
ing. Personnel Psychol., 1948, 1, 365-381. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


Using “Mark Sense” for Ratings and Personal 
Data Collection 


Bernard M. Bass 


Louisiana State University 


and Cecil R. Wurster 


Division of Research, Louisiana Department of Institutions 


The use of IBM “mark sense” cards has 
been adopted by the newly reorganized Divi- 
sion of Research of the Louisiana State De- 
partment of Institutions in its various report- 
ing and research projects. Mark-sensing is a 
procedure by which a specially printed IBM 
card (see Fig. 1) is marked with an electro- 
graphic pencil to indicate specific given in- 
formation. Through the use of the Mark 
Sensing Reproducer these marks are auto- 
matically converted into punched holes in the 
card corresponding to the relative positions 
of the pencil marks. The punched hole thus 
assumes the same numerical value as the 
mark which was originally made. This pro- 
cedure permits the collection of record or re- 
search data on IBM cards at the source of 


the data without any additional clerical or 


key punch work. Such a procedure elimi- 
nates all possible sources of clerical error in 
transferring data from original record to 
finally punched IBM card. 

All types of routine data on several hundred 
variables are now gathered on each of the 
100,000 yearly admissions to the state’s men- 
tal, tuberculosis and general hospitals, guid- 
ance centers, and correctional institutions by 
means of mark sense cards filled out by ap- 
propriate personnel at each institution. Dis- 
charge and follow-up cards provide a detailed 
picture of each patient’s institutional history 
which can be collated immediately, with no 
clerical labor, with the information obtained 
when the patient was admitted. Twenty- 
seven mark-sense columns are available on 
each card. Double marking of columns per- 
mits as many as 36 alternative responses per 
column. An example of a tuberculosis pa- 
tient admission card is shown in Fig. 1 to 
illustrate the types of classification possible. 
This card has been marked and processed 


through the reproducer to illustrate the con- 
version of marks to punches. (All heavy bars 
indicate columns which are to be “double 
marked.’’) 

Figure 2 is a correctional school exit card 
and illustrates the use of three-point rating 
scales in conjunction with personal data col- 
lection. 

More specific uses also are being made of 
mark sensing. Each week, as part of an ex- 
tensive, controlled study by the Department 
of Institutions Staff Committee on Mental 
Health Research to evaluate the effects of 
thorazine and reserpine therapies in the treat- 
ment of mental patients, every one of ap- 
proximately 400 patients at one hospital and 
300 at each of two other hospitals are being 
“mark sense” rated by from one to four phy- 
sicians, nurses, and attendants on 32 items of 
behavior. These “mark sense” SELH Rat- 
ing Scale Cards, developed by Frederick 
Hine, psychiatrist, and Joseph Dawson, clini- 
cal psychologist, are marked directly by the 
raters. Patient codes are prepunched into 
the cards for collation purposes. Little cleri- 
cal work precedes or follows the data collec- 
tion to obtain final research results. 

Statistical analyses can be prepared on 
IBM machines directly from the punched 
cards. For example, from the behavior rat- 
ing cards described above, an analysis of the 
agreement between the varying numbers of 
raters, using the Horst reliability formula 
(1), for each of the 32 items on each of six 
successive weeks was computed by an IBM 
604 Electronic Computer. (This computer 
performs all basic arithmetical computations 
—addition, subtraction, multiplication, and 
division.) The final 192 Horst reliability 
coefficients were obtained for a varying sam- 
ple of 100 to 150 patients about 12 work 


269 





Bernard M. Bass and Cecil R. Wurster 


. oy | os 


penetrate. 7 1 


TUSERCULUSIS PATIENT ADMISSION CARD 


os 
mner722<2“S 


SNOLLNAIASN! 40 ANDMLYWd90 J4V1S VNWISINOT 


seo seegenreernn annne 





;. 1. IBM mark sense Tuberculosis Patient Admission Card showing electrographic pencil marks 
converted into punched holes. 


hours after the ratings were made at a cost 
of $153.00. (By hand calculator, the same 
analysis would have required around 600 
days.) While these computations were in 


progress, machine duplications of the ratings 
were ready for various other types of analy- 
sis by IBM equipment. 

“Mark sense” lends itself to self-rating, 
test response scoring, and sociometric data 


collection. The senior author collected the 
rank order solutions by 300 subjects to each 
of 12 problems. Self-ratings and buddy rat- 
ings were also collected. Desired analyses of 
the data which would have required 40 years 
steady work by hand calculator were com- 
pleted in approximately three working days 


ceame Poel 
TRAINEE ave east 
cove es 

“en voce 
aunese it rec | oem 


J>c1>c1>¢] 


22¢2>¢2 


Mo 
pearireriom 


3>¢35c 
4oc4oc4ac4 


§>c§2c§2c 


J>c]J>c]J>c 


CORRECTIONAL SCHOOL EXIT CARD 


ean Baeanzranas 
as 0726638 


following the data collection. The IBM 650, 
a computer with greater speed and storage 
capacity than the IBM 604, carried out the 
calculations. (The 604’s “instructions” are 
changed mainly by rewiring; the 650’s in- 
structions are changed mainly by a punched 
deck of cards. Once the deck is assembled, 
the 650 runs itself.) 

By making use of mark sense procedure 
where he now collects data on printed forms, 
the applied psychologist might find his re- 
search speeded up immediately with no loss 
in accuracy and at a considerable savings in 
clerical costs. Since each response can be 
labeled and defined as desired just below the 
space to be marked, the procedure obviates 


PERSONALITY 
RATINGS 


onrwronwrt® wre - 


;. 2. IBM mark sense Correctional School Exit Card showing the use of rating scales in combina- 
tion with personal history data. 








Use of “Mark Sense” Cards 


the need for translating classifications into 
codes to be punched—one of the main sources 
of error in traditional “printed form-to-IBM 
key punch” operations. Translation is neces- 
sary only for classifications involving more 
than 12 alternatives which are mutually ex- 
clusive. 

Access to modern high-speed computers 
multiplies the value of “mark sensing.” We 
no longer need worry about the expense and 


271 


difficulty of coding and key punching large 
volumes of raw data, ordinarily required be- 
fore we can take advantage of the computers. 


Received October 13, 1955. 


Reference 
1. Horst, P. A generalized expression for the reli- 
ability of measures. Psychometrika, 1949, 
14, 21-31. 





The Journal of Applied Psychology 
Vol. 40, No. 4, 1956 


The Application of Temporal Correlation Techniques 
in Psychology 


W. Jay Merrill, Jr.1 and Corwin A. Bennett 


International Business Machines Corporation, Endicott, New York 


For a number of years methods have ex- 
isted and been applied for correlating vari- 
ables displaced in time from each other. Re- 
cent studies suggest that such methods of 
correlating over time will be of increasing 
importance in psychology during the next 
few years. Time as a variable in psychology 
has, with few exceptions, been the domain of 
investigations usually designated under the 
broad classifications of “learning” and “fa- 
tigue.” In experimentation not specifically 
concerned with these topics, changes in be- 
havior over time have generally been “prob- 
lems”—factors to be eliminated by some 
(sometimes devious) means. Consider the 
following hypothetical example. 

A psychologist is investigating the effects 
of illumination on productivity of beginning 
punch-press operators. Plotting the perform- 
ance of one operator under one condition of 
illumination against days on the job, he finds 
the relationship shown in Fig. 1, Curve A. 
If the investigator is concerned with learning 
as it affects productivity, he may be disap- 
pointed at the irregularities in his data, and 
attempt to reduce these irregularities by av- 
eraging the productivity of several operators 
to obtain a “smooth curve.” If he is not 
concerned with learning in his investigation, 
he will probably average together the several 
days on the job in order to eliminate the 
problem of the time variable. These are com- 
mon procedures and not necessarily worthy 
of deprecation. Such procedures may, how- 
ever, serve to preclude the discovery of im- 
portant behavioral relationships. 

To return to the hypothetical psychologist, 
suppose that he attempts to fit a “learning 
curve” to his data. He might find that a 
curve of the form, P = k,(1 — et) (where 
k, and k, are empirical constants), would 


1 The senior author wishes to express his gratitude 
to P. M. Fitts of Ohio State University for support 
during the early stages of the writing of this paper. 


272 


suffice (Curve B, Fig. 1). This would usu- 
ally be the final analysis of such data. If, 
however, the deviations from the fitted curve 
were plotted against trials, Curve C, Fig. 2, 
would result. An imaginative investigator 
might look at this plot and think that there 
were periodicities or cycles of performance 
present. He might then fit a second curve 
of the form, P = kgsin(k4t), where kg and k, 
are constants as Curve D, Fig. 2. Depend- 
ing on many factors some psychological ex- 
planation for such a periodicity might be 
suggested. 

The original hypothetical data are a time 
series. The deviations from the first fitted 
curve are a stationary time series since any 
“over-all tendencies” or trends have been re- 
moved from the data by the first curve-fitting 
process. The deviations from the second 
fitted curve are often treated as “random 
fluctuations.” Temporal correlation tech- 
niques are means of finding relationships be- 
tween some sort of performance and time for 
a stationary time series.* Not only cyclical 
relationships as in the present case but non- 
cyclical temporal phenomena may be discov- 
ered: by temporal correlation techniques. 


Definitions 


Temporal correlation techniques may be 
divided into two general classes: (a) discrete 
serial correlations—usually appropriate for psy- 
chological data; (6) continuous correlations— 
such as the auto-cross-correlation functions 
common at present’in engineering applications. 
Each of these classes of temporal correlation 
may be further subdivided into two analogous 
classes: (i) autocorrelation—correlation of a 
variable with itself displaced in time; (ii) cross- 


*A time series may be stationary either because 
no trends were present to begin with or because 
trends have been statistically removed. For meth- 
ods of removing trends see Kendall (13, Ch. 29). 





Application of Temporal Correlation Techniques 


a 


Za 








| 











HUNDREDS OF PIECES 








PRODUCTIVITY, 









































5 6 
DAYS ON THE JOB 


Hypothetical productivity time series and learning curve. 























. 








PRODUCTIVITY, HUNDREDS OF PIECES 



































DAYS ON THE. JOB 


Fic. 2. Deviations from hypothetical productivity learning curve and periodic function. 








274 


correlation—correlation of one variable with a 
second variable displaced in time.’ 

In terms of the familiar Pearson product- 
moment correlation formula, the serial auto- 
correlation for a particular time displacement, 
7, would be: 

Tac = Lite : (1) 


N,o? 
and the serial crosscorrelation would be 


= Lrxwi+e 
‘——— . 
Ny: 


(2) 


Tee 


where x; and y; are the deviation values of the 
respective time series at predetermined points, 
1, along the time axis and .\, indicates the 
number of these points. As 7 is increased, for 
a given time series, the number of available 
products decreases so that when 7 + ry indi- 
cates the last value in the series, xy, there is 
only one product, x:xy, in the numerator with 
which to estimate the value of the autocorre- 
lation with displacement ry. Crosscorrelation 
works in the same way except that the displace- 
ment, 7, is added to the series of y values rather 
than to the «x series, and the last product is 
xiyn.4 In practice, of course, auto- and cross- 
correlations would not be computed for dis- 
placements so large that V was very small. 
It is apparent that these serial correlations (as 
any product-moment coefficients) will vary be- 
tween + 1.00. Usual significance tests of prod- 
uct-moment r are appropriate.°® 


3 There has been considerable confusion of termi- 
nology in this area. Kendall (13, p. 402) uses “serial 
correlation” to mean what is here called “serial auto- 
correlation.” Other writers, such as Anderson (2), 
have used “autocorrelation” as equivalent to the 
present “serial autocorrelation” and “serial correla- 
tion” as equivalent to “serial crosscorrelation.” An- 
derson also quotes Yule as using “lagged serial cor- 
relation” as equivalent to “serial autocorrelation.” 
Some of the difficulty has arisen because these writers 
have been concerned only with temporal correlation 
of discrete data and thus have no need to distin- 
guish these from the correlation functions. 

4Formulas (1) and (2) imply the calculation of 
one estimate of ¢.*, ¢,, and ¢, for all displacements, 
t. As N, becomes small it is probably desirable to 
estimate these parameters by using only those terms 
of the series which are used in the cross products. 
See Kendall’s formula 30.7 (13, p. 402). 

5 Anderson (1, 2) and Hannam (11) discuss cer- 
tain special significance problems and tests connected 
with serial correlations. Hoel (12) lists a non- 
parametric test for “temporal relatedness” or “runs.” 


W. Jay Merrill, Jr. and Corwin A. Bennett 


If, in the case of discrete data, the standard 
deviation term in the formula for autocorrela- 
tion is dropped, a covariance form results.® 
This form is an approximate autocorrelation 
function for continuous data: 

;: 2 
ozz(T) & V. 2, thir, (3) 
Ne ta 
which is asymptotic to the autocorrelation 
function 


17 
oz2(7) = lim zf x(t)x(t+ 7) dt, (4) 
T3202 T 


0 


where instead of summing NV, deviation value 


products, the product of two functions over 


the time interval, 7, is integrated. Thus, if a 
continuous time series, x(/), was made discrete 
by using only certain points, calling these 
values x;, the analogous discrete autocorrela- 
tion function would be obtained. 

Similarly, the discrete crosscorrelation func- 
tion may be defined as 


: 1 Nr 
dyz(7) — N. > a XiVi+r, (5) 


? i=l 


and the continuous form as 


1 T 
¢zy(7) = lim = f x(t)y(t-+ 7) dt. (6) 
Tx T 0 


Since Equations 4 and 6 require an averaging 
process in time, they are called time averages. 
The autocorrelation function is a continuous 
symmetrical function with a maximum at 
t= 0.’ In the absence of periodicities, the 
function is asymptotic to the square of the 
mean of the “random function.” Therefore, if 
the mean is zero the autocorrelation function 
tends to zero as the displacement, 7, tends 
to infinity. Autocorrelation for a time series 
without periodicities (a random function) might 
look like Fig. 3. 

® The covariance form might actually be used in 
practice for a given time series since the standard 
deviation terms would be relatively constant and the 
appearance of the plotted correlation functions 
would not be affected. 

7 The restrictions implied for the autocorrelation’ 
function in communication engineering are that it be 
a damped even function with the maximum at the 
origin. Such properties need not be the case in 


other areas of application, although the function 
will approach evenness as T—> %, 








Application of Temporal Correlation Techniques 


$ xxi 





+ 



























































DISPLACEMENT, r 


Fic. 3. 


Crosscorrelation is a measure of coherence 
between two functions. For independent ran- 
dom functions the crosscorrelation function is 
a constant which is the product of the indi- 
vidual mean values of the functions. Thus, if 
one mean were zero the crosscorrelation would 
be zero everywhere. This is called incoherence 
and is reminiscent of Pearson r, which has a 
zero value under analogous conditions. 

Without resorting to a mathematical demon- 


Hypothetical autocorrelation function plot depicting a random function. 


stration, it may be pointed out that the auto- 
correlation function, ¢.:(7), of a periodic func- 
tion, x(t), is periodic itself, retains the funda- 
mental frequency and harmonics of x(t), but 
drops all phase angles. The crosscorrelation 
function, ¢.,(7), retains the fundamental fre- 
quency only if both x(t) and y(t) contain it, 
and retains only those harmonics which are 
present in both along with their phase differ- 
ences. 











RANDOM + PERIODIC COMPONENT 





| 
RANDOM COMPONENT 






































° 


DISPLACEMENT, r 


Fic. 4. Hypothetical autocorrelation function plot depicting a periodic function plus randomness. 





276 


Figure 4 shows ¢,,(7) for a sine function 
plus randomness in contrast to Fig. 3, the auto- 
correlation function of a random function alone. 
A crosscorrelation graph of two sine functions 
would not show the random component, nor 
necessarily have the same period, but the gen- 
eral character of the curve would be preserved. 
Phase differences would tend to keep one of the 
maxima from lying on the ordinate. 

If periodicities are present in the time series, 
it is evident that large enough values of 7 need 
to be taken in autocorrelation so that the ran- 
dom influences approach zero as in Fig. 4. 
Thus, if x(¢) is a mixture of periodic and ran- 
dom components, then 


x(t) = a»(t) + x,(2). 


By application of the definition of ¢..(r) Equa- 
tion 4, the autocorrelation of x(t) is 


1 T 
Gex(7) = lim 7 f [xp(t) + x-(d)] 
X [xp(t+ 7) + x(t + 7)] dt. 


When the two binomials are multiplied to- 
gether and the limits of integration applied to 
each term, the result is 


¢22(7) - $z,2,(7) + $:,2,(7) 
+ $2,2,(7) + o2,z,(7). 


The first and last terms on the right are the 
autocorrelation functions for the periodic per- 
formance and the random function, respec- 
tively. The center terms are their crosscorre- 
lation function. As a matter of convenience 
let the means of x,(¢) and x,(t) be zero. The 
function ¢z,2(7) is nonperiodic and goes to 
zero as rt approaches infinity. Due to inco- 
herence, $z,2,(T) and ¢:,z,(7) vanish, leaving 
$2,2,(T) as the value of ¢.,(7), showing that 
the periodic component is responsible for the 
correlation as rt approaches infinity. 

Similarly, the crosscorrelation can be shown 
to be 


dzy(T) = z,v,(T)- 


Crosscorrelation has the advantage of not being 
distorted around r = 0 because ¢2,,,(7) is in- 
coherent, and, with zero mean assumption, 
vanishes everywhere. 

There is no method available for testing sig- 


W. Jay Merrill, Jr. and Corwin A. Bennett 


nificance of the correlation functions. Usually 
this will present little difficulty for if the con- 
tinuous form is calculated over satisfactorily 
long T or if the discrete form is calculated for 
large .V, significance may generally be assumed 
for moderate or large ¢(r) at the maxima. 

A method of analysis which is closely related 
to the autocorrelation function is the power 
density spectrum, ®,,(w). The functions, 
¢22:(7) and #,,(w), are Fourier cosine trans- 
forms of each other. That is, 


$z2(7) = £ &(w) cos wr dw 


1 c) 
®,,(w) = xf o(r) cos wr dr. 


Thus, when either ¢(7) or ®(w) is known, the 
other may be found. In power density spec- 
trum analysis, power density is plotted against 
frequency rather than time displacement. 
This method is used frequently in studies of 
human tracking behavior.* 


Computation 


For brevity and ease of presentation only 
autocorrelation computation will be discussed. 
The only difference in computing the cross- 
correlation lies in reading one value from the 
curve of one time series and the r-displaced 
value from the other. 

Calculation of serial autocorrelation is 
straightforward Pearson correlation procedure. 
The original set of data will constitute the 
variable X. From this set a new variable X’ 
will be constructed such that the second score 
of the original set is now the first score of the 
new set, the third score is now the second and 
soon. By using any of the methods of com- 
putation for Pearson r on these two sets, X 
and X’, a serial autocorrelation with “lag one” 
is calculated. Serial autocorrelations of greater 
than lag one may be calculated in a similar 
manner by constructing new sets of scores, 
X”", X’", ---, by displacing the scores corre- 

8 Another method used for studying temporal phe- 
nomena involves certain information measures (5, 
10, 18, 19, 24, 25). For other general discussions 


of autocorrelation functions and related techniques 
see References 17, 27, and 29. 

































































Fic. 5. 


spondingly. By plotting these serial auto- 
correlation values versus the lag, periodicities 
and the random function may be noted.® 

Autocorrelations for continuous data are not 
readily calculated by hand except by changing 
such data into discrete form and then treating 
as above. Machine methods of calculating 
autocorrelation functions do exist however. 

If an analog autocorrelator is available it 
will follow these steps: (a) the function, x(f), 
is displaced by a small interval, 7, resulting in 
x(t-+ 7); (b) these two functions are continu- 
ously multiplied; (c) the product is integrated 
(continuously added); and (d) the average 
value of the integral is taken over the interval 
of integration. This entire procedure would 
then be repeated for other values of r. 

If a digital computer is available, a second 
machine method of calculation of the auto- 
correlation function is utilized. This method 
is essentially one of changing the form of the 
data from continuous to discrete and calcu- 
lating a serial approximation to the autocorre- 
lation function. Analogous to the assumption 
that discrete samples are from a continuous 
distribution (a common assumption in statis- 
tics), the sequence of discrete points in the 


® Other coefficients, where applicable, might be 
used in place of Pearson r; for instance, phi (8), 
and tetrochoric r. Indeed, Wertheimer (28) used 
tetrachoric coefficients in one of his studies. How- 
ever, such statistics have the same limitations in 
this application as in any other. Chapanis (5) has 
used chi square. 


TIME 
Subdivision of a continuous time series to obtain discrete data. 


temporal series is likewise assumed to be an 
accurate representation of the continuous func- 
tions that make up the time series. The pro- 
cedure follows: (a) the function, x(t), is divided 
into sections of duration L (see Fig. 5 (these 
sections are chosen such that when periodicities 
are present, the junction points of the sections 
do not always occur at fixed locations relative 
to the periodicities) ; (b) the a; values are 
determined by evaluating x(t) at the L junc- 
tions, the b; values at a constant time 7 after 
the a,s; (c) corresponding as and bs are multi- 
plied and summed; and (d) the sum is divided 
by the total number of products. Again, the 
procedure would be repeated for other values 
of r. 

Steps c and d of this process may be ex- 
pressed by the formula 


Nr 
az: a,b; 
i=l 


N; 


Noting the correspondence of a; to x; and of 
b; to x;4,-, this relationship is very much like 
the original expression for ra. given by Equa- 
tion 1. The difference is that in Equation 1 
the variance is present in the denominator. 
10 More precisely, when choosing the number of 
L’s for discrete sampling of the function, the num- 
ber of samples should exceed slightly the number of 
cycles of the highest frequency component, so the 
L < min p, where is the period of the highest fre- 
quency. In general, the size of L should be chosen 
so that x(t) doesn’t change appreciably during L. 


4 





278 


This fact shows clearly that the autocorrelation 
function as defined does not vary between +1 
and —1. To accomplish this the next step 
would be to divide the covariance by the prod- 
uct of the standard deviations. In the case of 
autocorrelation with large V, the o’s would 
approach equality because the same function 
is responsible for the a; and 6; values. In 
crosscorrelation there will usually be a differ- 
ence in the two values of o and the two means. 
The result of this standardizing process in 
either case is the Pearsonian r. Here again 
the r must be computed for each different value 
of + desired. 

As the number of sections, L, gets larger the 
calculated value approaches the value of the 
autocorrelation function for the given r. For 
exact equivalence the number of sections would 
have to be infinite. 

Occasionally some values of the autocorrela- 
tion calculated for large r may exceed previous 
maxima. These are generally ignored since 
they result from the small V for large r. The 
origin value may be exceeded for a 7 with large 
N when a combination of periodic components 
occurs in phase. 


Applications 


Temporal correlation techniques have been 
applied in several areas of psychology thus 


far. It is envisioned that many more will be 
found. 

In the area of psychophysics several in- 
vestigations (8, 26, 28) have been carried 
out to demonstrate that successive trials in 
psychophysical experiments cannot represent 
samplings from a population of independent 
responses. Typically, these studies have ob- 
tained “yes-no” responses in threshold-deter- 
mination situations for successive trials. In- 
tertrial interval has been varied from a few 
seconds to as long as a day. Serial autocor- 
relations or significance tests corresponding 
to these correlations have given results much 
like the right half of the curve shown in Fig. 
3. The interpretation is that (a) the effects 
of one trial show up on successive trials in 
such determinations, and (6) as the displace- 
ment between trials is increased, the strength 
of the effect decreases. 

The crosscorrelation function and the power 


W. Jay Merrill, Jr. and Corwin A. Bennett 


density spectrum have found extended use in 
studies of human tracking behavior (6, 7, 14, 
15, 16). Thus, the target is moved about in 
some fashion called the disturbance function. 
In (pursuit) tracking the follower is moved 
by the subject in response to the disturbance 
function. The subject’s response curve and 
the disturbance function are crosscorrelated. 
The crosscorrelation will generally be peri- 
odic and will have a maximum at some dis- 
placement such that the given response cor- 
responds to some earlier disturbance. This 
displacement is called the reaction time and 
generally equals about half a second. 

Philpott (20, 21, 22) has investigated out- 
put fluctuation in group performance of rela- 
tively simple mental tasks such as _ sub- 
stitution, easy arithmetic, dotting, etc. He 
discovered that output peaks occurred at 
predictable intervals and that the peaks 
tended to follow a pattern of generally in- 
creasing magnitude, which could be predicted 
by the simultaneous occurrences in phase of 
sine-like waves of different frequencies. Phil- 
pott’s investigations have, however, been 
criticized recently by Richardson (23) on 
statistical grounds. 

In a brief description of their work, Bar- 
low and Brazier (3) tell of studying “auto- 
correlation of spontaneous activity in the 
cortex.” They are also using the crosscorre- 
lation function “for the detection of responses 
in brain potentials evoked by experimental 
sensory stimuli.” 

There are many other areas of psychologi- 
cal research in which application of temporal 
correlation techniques might prove fruitful. 
Time and motion study might benefit from 
such analysis since definite “rhythms” or peri- 


. odicities in routine tasks may be useful in the 


performance of such tasks. Studies of operant 
conditioning where periods of activity and 
inactivity need specification and explanation 
is another possibility. A study by Bixenstine 
(4) would seem to suggest that physiological 
measures such as palmar sweating have peri- 
odicities with a possible period of one week. 
Temporal correlation might be applied in a 
theoretical investigation of test-retest reli- 
ability where a curve of a so-called random 
function would presumably obtain. 











Application of Temporal Correlation Techniques 


In a recent literature review, Fiske and 
Rice (9) have discussed a wide variety of 
studies of what they term “intra-individual 
variability.” In this review they are con- 
cerned with predictable differences between 
“responses [which] show no systematic trend 
over time” [stationary time series]. These 
writers point out the extreme importance of 
such behavior and also point out that little 
systematic effort has been devoted to the 
area, although many isolated studies have 
been carried out. One of the most striking 
features about these studies is the wide va- 
riety of methods used to determine the na- 
ture of such time-varying behavior. Fiske 
and Rice indicate the unsatisfactory nature 
of some of these methods. While they rather 
pointedly ignore the autocorrelation methods 
in their discussions of methodology, it would 
seem that in many such instances these tech- 
niques would be ideally suited to the prob- 
lem at hand. 

Hopefully, investigators in other specific 
areas might see the usefulness of the tech- 
niques in their specialties. Certainly, it is 
only through the enrichment of psychological 
methods by addition of such specific tech- 
niques that psychology can aspire to predict 
behavior to its fullest extent. 


Summary 


Definitions and computation procedures for 
various temporal correlation techniques are 
presented. These techniques include serial 
correlations for discrete data and correlation 
functions for continuous data. Specifically 
described are autocorrelations for temporal 
relatedness within one series of data, and 
crosscorrelations for such relatedness between 
two series. These techniques are appropriate 
for discovery of both cyclical and noncyclical 
temporal phenomena. Various applications of 
temporal correlation techniques within psy- 
chology are described. 


Received February 23, 1956. 
Early Publication. 
References 


1. Anderson, R. L. Distribution of the serial cor- 
relation coefficient. Ann. math. Statist., 1942, 
13, 1. 


~ 


15. 


18. 


279 


. Anderson, R. L. The problem of autocorrela- 
tion in regression analysis. J. Amer. statist. 
Ass., 1954, 49, 113-129. 

. Barlow, J. S., & Brazier, M. A. B. Correlation 
studies of brain potentials. MIT Quart. 
Progr. Rep., April 1955, 79-82. 

. Bixenstine, V. E. A case study of the use of 
palmar sweating as a measure of psychologi- 
cal tension. J. abnorm. soc. Psychol., 1955, 
50, 138-143. 

. Chapanis, A. Random-number guessing behav- 
ior. Amer. Psychologist, 1953, 8, 332. 

. Clark, J. R., Fontaine, A. B., & Warren, C. E. 
The generation of a continuous random signal 
for use in human tracking studies. USAF, 
Hum. Resour. Res. Cent., Res. Bull., 1953, 
No. 53-40. 

. Clark, J. R., & Warren, C. E. A photometric 
correlator. USAF, Hum. Resour. Res. Cent., 
Res. Bull., 1953, No. 53-42. 

. Collier, G. Intertrial association at the visual 
threshold as a function of intertrial interval. 
J. exp. Psychol., 1954, 48, 330-334. 

. Fiske, D. W., & Rice, Laura. Intra-individual 
response variability. Psychol. Bull., 1955, 52, 
217-250. 

. Frick, F. C., & Miller, G. A. A statistical de- 
scription of operant conditioning. Amer. J. 
Psychol., 1951, 64, 20-36. 

. Hannam, E. J. Exact tests for serial correlation. 
Biometrika, 1955, 42, 316-326. 

. Hoel, P. G. Introduction to mathematical sta- 
tistics. New York: Wiley, 1947. 


. Kendall, M. G. The advanced theory of sta- 


tistics, Vol. 2. London: Griffin, 1948. 

. Krendel, E. S.A preliminary study of the 
power-spectrum approach to the analysis of 
perceptual-motor performance. USAF,WADC 
Tech. Rep., 1951, 6723. 

Krendel, E. S. The spectral density study of 
tracking performance: Part 1, The effect of 
instructions. USAF, WADC Tech. Rep., 1952, 
No. 11. 

. Krendel, E. S. The spectral density study of 
tracking performance: Part 2, The effects of 
input amplitude and practice. USAF, WADC 
Tech. Rep., 1952, No. 11. 


. Lee, Y. W., Cheatham, T. P., & Wiesner, J. B. 


Application of correlation analysis to the de- 
tection of periodic signals in noise. Proc. 
IRE, Oct. 1950. 

Newman, E. B. Computational methods useful 
in analyzing series of binary data. Amer. J. 
Psychol., 1951, 64, 252-262. 

. Newman, E. B. The pattern of vowels and con- 
sonants in various languages. Amer. J. Psy- 
chol., 1951, 64, 369-379. 

. Philpott, S. J. F. Fluctuations in human out- 
put. Brit. J. Psychol. Monogr. Suppl., 1933, 
6, No. 17. 





280 


21. Philpott, S. J. F. The curve of fluctuations in 
mental output and the curve of numbers of 
factors in the natural numbers. Brit. J. Psy- 
chol., 1949, 39, 123-141. 

22. Philpott, S. J. F. Fluctuations in mental out- 
put. Quart. Bull. Brit. Psychol. Soc., 1950, 
1, 264-280. 

23. Richardson, L. F. Dr. S. J. F. Philpott’s wave 
theory. Brit. J. Psychol., 1952, 43, 468-475. 

24. Senders, Virginia. Further analysis of response 
sequences in the setting of a psychophysical 
experiment. Amer. J. Psychol., 1953, 66, 215- 
228. 

25. Senders, Virginia, & Sowards, A. Analysis of 
response sequences in the setting of a psycho- 


W. Jay Merrill, Jr. and Corwin A. Bennett 


physical experiment. 
65, 358-374. 

26. Verplanck, W. S., Collier, G. S., & Cotton, J. W. 
Non-independence of successive responses in 
measurements of the visual threshold. J. exp. 
Psychol., 1952, 44, 273-282. 

27. Weiner, N. The extrapolation, interpolation and 
smoothing of stationary time series. New 
York: Wiley, 1949. 

28. Wertheimer, M. An investigation of the “ran- 
domness” of threshold measurements. J. exp. 
Psychol., 1953, 45, 294-303. 

29. Wise, J. The autocorrelation function and the 
spectral density function. Biometrika, 1955, 
42, 151-159. 


Amer. J. Psychol., 1952, 











